Dialogue Mixing Tips


Video is a visual medium, but the audio side of a project is as important – often more important – than the picture side. When story context is based on dialogue, then the story will make no sense if you can’t hear or understand that spoken information. In theatrical mixes, it’s common for a three person team of rerecording mixers to operate the console for the final mix. Their responsibilities are divided into dialogue, sound effects, and music. The dialogue mixer is usually the team lead, precisely because intelligible dialogue is paramount to a successful motion picture mix. For this reason, dialogue is also mixed as primarily mono coming from the center speaker in a 5.1 surround set-up.

A lot of my work includes documentary-style entertainment and corporate projects, which frequently lean on recorded interviews to tell the story. In many cases, sending the mix outside isn’t in the budget, which means that mix falls to me. You can mix in a DAW or in your NLE. Many video editors are intimidated by or unfamiliar with ProTools or Logic Pro X – or even the Fairlight page in DaVinci Resolve. Rest assured that every modern NLE is capable of turning out an excellent stereo mix for the purposes of TV, web, or mobile viewing. Given the right monitoring and acoustic environment, you can also turn out solid LCR or 5.1 surround mixes, adequate for TV viewing.

I have covered audio and mix tips in the past, especially when dealing with Premiere. The following are a few more pointers.

Original location recording

You typically have no control over the original sound recording. On many projects, the production team will have recorded double-system sound controlled by a separate location mixer (recordist). They generally use two microphones on the subject – a lav and an overhead shotgun/boom mic.

The lav will often be tucked under clothing to filter out ambient noise from the surrounding environment and to hide it from the camera. This will sound closer, but may also sound a bit muffled. There may also be occasional clothes rustle from the clothing rubbing against the mic as the speaker moves around. For these reasons I will generally select the shotgun as the microphone track to use. The speaker’s voice will sound better and the recording will tend to “breathe.” The downside is that you’ll also pick up more ambient noise, such as HVAC fans running in the background. Under the best of circumstances these will be present during quiet moments, but not too noticeable when the speaker is actually talking.


The first stage of any dialogue processing chain or workflow is noise reduction and gain correction. At the start of the project you have the opportunity to clean up any raw voice tracks. This is ideal, because it saves you from having to do that step later. In the double-system sound example, you have the ability to work with the isolated .wav file before syncing it within a multicam group or as a synchronized clip.

Most NLEs feature some audio noise reduction tools and you can certainly augment these with third party filters and standalone apps, like those from iZotope. However, this is generally a process I will handle in Adobe Audition, which can process single tracks, as well as multitrack sessions. Audition starts with a short noise print (select a short quiet section in the track) used as a reference for the sounds to be suppressed. Apply the processing and adjust settings if the dialogue starts sounding like the speaker is underwater. Leaving some background noise is preferable to over-processing the track.

Once the noise reduction is where you like it, apply gain correction. Audition features an automatic loudness match feature or you can manually adjust levels. The key is to get the overall track as loud as you can without clipping the loudest sections and without creating a compressed sound. You may wish to experiment with the order of these processes. For example, you may get better results adjusting gain first and then applying the noise reduction afterwards.

After both of these steps have been completed, bounce out (export) the track to create a new, processed copy of the original. Bring that into your NLE and combine it with the picture. From here on, anytime you cut to that clip, you will be using the synced, processed audio.

If you can’t go through such a pre-processing step in Audition or another DAW, then the noise reduction and correction must be handled within your NLE. Each of the top NLEs includes built-in noise reduction tools, but there are plenty of plug-in offerings from Waves, iZotope, Accusonus, and Crumplepop to name a few. In my opinion, such processing should be applied on the track (or audio role in FCPX) and not on the clip itself. However, raising or lowering the gain/volume of clips should be performed on the clip or in the clip mixer (Premiere Pro) first.

Track/audio role organization

Proper organization is key to an efficient mix. When a speaker is recorded multiple times or at different locations, then the quality or tone of those recordings will vary. Each situation may need to be adjusted differently in the final mix. You may also have several speakers interviewed at the same time in the same location. In that case, the same adjustments should work for all. Or maybe you only need to separate male from female speakers, based on voice characteristics.

In a track-based NLE like Media Composer, Resolve, Premiere Pro, or others, simply place each speaker onto a separate track so that effects processing can be specific for that speaker for the length of the program. In some cases, you will be able to group all of the speaker clips onto one or a few tracks. The point is to arrange VO, sync dialogue, sound effects, and music together as groups of tracks. Don’t intermingle voice, effects, or music clips onto the same tracks.

Once you have organized your clips in this manner, then you are ready for the final mix. Unfortunately this organization requires some extra steps in Final Cut Pro X, because it has no tracks. Audio clips in FCPX must be assigned specific audio roles, based on audio types, speaker names, or any other criteria. Such assignments should be applied immediately upon importing a clip. With proper audio role designations, the process can work quite smoothly. Without it, you are in a world of hurt.

Since FCPX has no traditional track mixer, the closest equivalent is to apply effects to audio lanes based on the assigned audio roles. For example, all clips designated as dialogue will have their audio grouped together into the dialogue lane. Your sequence (or just the audio) must first be compounded before you are able to apply effects to entire audio lanes. This effectively applies these same effects to all clips of a given audio role assignment. So think of audio lanes as the FCPX equivalent to audio tracks in Premiere, Media Composer, or Resolve.

The vocal chain

The objective is to get your dialogue tracks to sound consistent and stand out in the mix. To do this, I typically use a standard set of filter effects. Noise reduction processing is applied either through preprocessing (described above) or as the first plug-in filter applied to the track. After that, I will typically apply a de-esser and a plosive remover. The first reduces the sibilance of the spoken letter “s” and the latter reduces mic pops from the spoken letter “p.” As with all plug-ins, don’t get heavy-handed with the effect, because you want to maintain a natural sound.

You will want the audio – especially interviews – to have a consistent level throughout. This can be done manually by adjusting clip gain, either clip by clip, or by rubber banding volume levels within clips. You can also apply a track effect, like an automatic volume filter (Waves, Accusonus, Crumplepop, other). In some cases a compressor can do the trick. I like the various built-in plug-ins offered within Premiere and FCPX, but there are a ton of third-party options. I may also apply two compression effects – one to lightly level the volume changes, and the second to compress/limit the loudest peaks. Again, the key is to apply light adjustments, because I will also compress/limit the master output in addition to these track effects.

The last step is equalization. A parametric EQ is usually the best choice. The objective is to assure vocal clarity by accentuating certain frequencies. This will vary based on the sound quality of each speaker’s voice. This is why you often separate speakers onto their own tracks according to location, voice characteristics, and so on. In actual practice, only two to three tracks are usually needed for dialogue. For example, interviews may be consistent, but the voice-over recordings require a different touch.

Don’t get locked into the specific order of these effects. What I have presented in this post isn’t necessarily gospel for the hierarchical order in which to use them. For example, EQ and level adjusting filters might sound best when placed at different positions in this stack. A certain order might be better for one show, whereas a different order may be best the next time. Experiment and listen to get the best results!

©2020 Oliver Peters

Time to Rethink ProRes RAW?

The Apple ProRes RAW codec has been available for several years at this point, yet we have not heard of any professional cinematography camera adding the ability to record ProRes RAW in-camera. I covered ProRes RAW with some detail in these three blog posts (HDR and RAW Demystified, Part 1 and Part 2, and More about ProRes RAW) back in 2018. But the industry has changed over the past few years. Has that changed any thoughts about ProRes RAW?

Understanding RAW

Today’s video cameras evolved their sensor design from a three CCD array for RGB into a single sensor, similar to those used in still photo cameras. Most of these sensors are built using a Bayer pattern of photosites. This pattern is an array of monochrome receptors that are filtered to receive incoming green, red, and blue wavelengths of light. Typically the green photosites cover 50% of this pattern and red and blue each cover 25%. These photosites capture linear light, which is turned into data that is then meshed and converted into RGB pixel information. Lastly, it’s recorded into a video format. Photosites do not correlate in a 1:1 relationship with output pixels. You can have more or fewer total photosite elements in the sensor than the recorded pixel resolution of the file.

The process of converting photosite data into RGB video pixels is done by the camera’s internal electronics. This process also includes scaling, gamma encoding (Rec709, Rec 2020, or log), noise reduction, image sharpening, and the application of that manufacturer’s proprietary color science. The term “color science” implies some type of neutral mathematical color conversion, but that isn’t the case. The color science that each manufacturer uses is in fact their own secret sauce. It can be neutral or skewed in favor of certain colors and saturation levels. ARRI is a prime example of this. They have done a great job in developing a color profile for their Alexa line of cameras that approximates the look of film.

All of this image processing adds cost, weight, and power demands to the design of a camera. If you offload the processing to another stage in the pipeline, then design options are opened up. Recording camera raw image data achieves that. Camera raw is the monochrome sensor data prior to the conversion into an encoded video signal. By recording a camera raw file instead of an encoded RGB video file, you defer the processing to post.

To decode this file, your operating system or application requires some type of framework, plug-in, or decoding/developing software in order to properly interpret that data into a color image. In theory, using a raw file in post provides greater control over ISO/exposure and temperature/tint values in color grading. Depending on the manufacturer, you may also apply a variety of different camera profiles. All of this is possible and still have a camera file that is of a smaller size than its encoded RGB counterpart.

In-camera recording, camera raw, and RED

Camera raw recording preceded the introduction of the RED One camera. These usually consisted of uncompressed movie files or image sequences recorded to an external recorder. RED introduced the ability to record a Wavelet-compressed, 4K camera raw signal at 24fps. This was a movie file recorded onboard the camera itself. RED was granted a number of patents around these processes, which preclude any other camera manufacturer from doing that exact same thing, unless entering into a licensing agreement with RED. So far these patents have been successfully upheld against Sony and Apple among others.

In 2007 – part way through the Final Cut Pro product run – Apple introduced its family of ProRes codecs. ProRes was Apple’s answer to Avid’s DNxHD codec, but with some improvements, like resolution independence. ProRes not only became Apple’s default intermediate codec, but also gained stature as the mastering and delivery codec of choice, regardless of which NLE you were using.

By 2010 Apple was successful in convincing ARRI to use ProRes as its internal recording codec with the introduction of the (then new) line of Alexa cameras. (ARRI camera raw recording was a secondary option using ARRIRAW and a Codex recorder.) Shooting with an Alexa, recording high-quality ProRes files, and posting those directly within FCP or any other compatible NLE created the simplest and smoothest capture-edit-deliver pipeline of any professional post workflow. That remains unchanged even today.

Despite ARRI’s success, only a few other camera manufacturers have adopted ProRes as an internal recording option. To my knowledge these include some cameras from AJA, JVC, Blackmagic Design, and RED (as a secondary file to REDCODE). The lack of widespread adoption is most likely due to Apple’s licensing arrangement, coupled with the fact that ProRes is a proprietary Apple format. It may be a de facto industry standard, but it’s not an official standard sanctioned by an industry standards committee.

The introduction of Apple’s ProRes RAW codecs has led many in the industry to wait with bated breath for cameras to also adopt ProRes RAW as their internal camera raw option. ARRI would obviously be a candidate. However, the RED patents would seem to be an impediment. But what if Apple never had that intention in the first place?

Do we have it all wrong?

When Apple introduced ProRes RAW, it did so in partnership with Atomos. Just like Sony, ARRI, and Panasonic recording their camera raw signals to an external recorder, sending a camera raw signal to an external Atomos monitor/recorder is a viable alternative to in-camera recording. Atomos’ own disagreements with RED have now been settled. Therefore, embedding the ProRes RAW codec into their products opens up that recording format to any camera manufacturer. The camera simply has to be capable of sending a compatible camera raw signal (as data) over SDI or HDMI to the connected Atomos recorder.

The desire to see ProRes RAW in-camera stems from the history of ProRes adoption by ARRI and the impact that had on high-end production and post. However, that came at a time when Apple was pushing harder into various pro film and video markets. As we’ve learned, that course was corrected by Steve Jobs, leading to the launch of Final Cut Pro X. Apple has always been about ease and democratization – targeting the middle third of a bell curve of users, not necessarily the top or bottom thirds. For better or worse, Final Cut Pro X refocused Apple’s pro video direction with that in mind.

In addition, during this past decade or more, Apple has also changed its approach to photography. Aperture was a tool developed with semi-pro and pro DSLR photographers in mind. Traditional DSLRs have lost photography market share to smart phones – especially the iPhone. Online sharing methods – Facebook, Flickr, Instagram, cloud picture libraries – have become the norm over the traditional photo album. And so, Aperture bit the dust in favor of Photos. From a corporate point-of-view, the rethinking of photography cannot be separated from Apple’s rethinking of all things video.

Final Cut Pro X is designed to be forward-thinking, while cutting the chord with many legacy workflows. I believe the same can be applied to ProRes RAW. The small form factor camera, rigged with tons of accessories including external displays, is probably more common these days than the traditional, shoulder-mounted, one-piece camcorder. By partnering with Atomos (and maybe others in the future), Apple has opened the field to a much larger group of cameras than handling the task one camera manufacturer at a time.

ProRes RAW is automatically available to cameras that were previously stuck recording highly-compressed M-JPEG or H.264/265 formats. Video-enabled DSLRs from manufacturers like Nikon and Fujifilm join Canon and Panasonic cinematography cameras. Simply send a camera raw signal over HDMI to an Atomos recorder. And yet, it doesn’t exclude a company like ARRI either. They simply need to enable Atomos to repack their existing camera raw signal into ProRes RAW.

We may never see a camera company adopt onboard ProRes RAW and it doesn’t matter. From Apple’s point-of-view and that of FCPX users, it’s all the same. Use the camera of choice, record to an Atomos, and edit as easily as with regular ProRes. Do you have the depth of options as with REDCODE RAW? No. Is your image quality as perfect in an absolute (albeit non-visible) sense as ARRIRAW? Probably not. But these concerns are for the top third of users. That’s a category that Apple is happy to have, but not crucial to their existence.

The bottom line is that you can’t apply classic Final Cut Studio/ProRes thinking to Final Cut Pro X/ProRes RAW in today’s Apple. It’s simply a different world.



The images I’ve used in this post come from Patrik Pettersson. These clips were filmed with a Nikon Z6 DSLR recording to an Atomos Ninja V. He’s made a a few sample clips available for download and testing. More at this link. This brings up an interesting issue, because most other forms of camera raw are tied to a specific camera profile. But with ProRes RAW, you can have any number of cameras. Once you bring those into Final Cut Pro X, you don’t have the correct camera profile with a color science that matches that model for each any every camera.

In the case of these clips, FCPX doesn’t offer any Nikon profiles. I decided to decode the clip (RAW to log conversion) using a Sony profile. This gave me the best possible results for the Nikon images and effectively gives me a log clip similar to that from a Sony camera. Then for the grade I worked in Color Finale Pro 2, using its ACES workflow. To complete the ACES workflow, I used the matching SLog3 conversion to Rec709.

The result is nice and you do have a number of options. However, the workflow isn’t as straightforward as Apple would like you to believe. I think these are all solvable challenges, but 1) Apple needs to supply the proper camera profiles for each of the compatible cameras; and 2) Apple needs to publish proper workflow guides that are useful to a wide range of users.

©2020 Oliver Peters

A First Look at Postlab Cloud

Apple developed Final Cut Pro X around single-editor workflows. As such, professional editing teams who wanted to use this tool for collaborative editing have been challenged to develop their own solutions. One approach was Postlab, which was developed in-house at Dutch broadcaster Evanglische Omroep (EO). In order to expand the product as a commercial application, lead developer Jasper Siegers decided to move it under the Hedge umbrella. This required the app to be rebuilt with new code before it could be offered to the FCPX market. That time has come and Postlab is now available as Postlab Cloud.

As the name implies, Postlab Cloud hosts your FCPX libraries “in the cloud,” i.e. on Postlab’s servers. Some production companies or broadcasters are reticent to have their editing computers connected online, but it’s important to note that only libraries and no media or caches are hosted by Postlab. This keeps the transfer times fast and file sizes light. Cache and media files stay local, whether on your machine or on connected shared storage. Postlab sets up accounts based on site licenses and numbers of users. Each user is assigned a log-in based on an e-mail address and a password. This means that a production hosted by Postlab can be accessed by authorized users anywhere in the world, provided there’s a viable internet connection.

The owner of the account can set up productions and organize them within folders. Each production is a collection or bundle of one or more Final Cut Pro X libraries. If you have ever worked with Final Cut Server in the FCP7 days, then the Postlab workflow is very similar. Once a production has been created, an editor can log in, download the library (a check-out step), edit in it, and then upload the changed version (a check-in step). As part of this upload, the Postlab interface prompts you to add comments describing the work you’ve done. Only one editor at a time can download a library and have write access; however, other users can still download it with read-only access. If you have two editors ping-ponging work on the same library file, then one has to upload it (check in) before the other editor can download it (check out) for their edits.

Getting started

I decided to test Postlab Cloud in two scenarios: a) multiple workstations connected to a shared storage network, and b) two disconnected editors collaborating over long distances. To start, once an account has been established, any editor using Postlab Cloud must install the small Postlab application. Since the app controls some of Final Cut’s functions, you will be prompted to enable GUI Scripting in your privacy preferences. In order for Postlab to work properly, media and cache files need to be outside of the library bundle. When you first download a library, you may be prompted to change your settings. In a networked environment with media on shared storage, the path to the media should be the same on each workstation. This means when Editor A finishes and checks in the production and then Editor B checks it back out, you generally will not need to relink the media files on Editor B’s system. Therefore, this edit collaboration can proceed fluidly.

Once a production has been downloaded, the library file exists as a temporary file on the local machine and not the network. This means that Postlab can still work in tandem with storage solutions that don’t normally perform well with FCPX libraries. In addition to this temporary library file, the Final Cut backup library is also stored in the location you have designated. If you are working in a networked, collaborative environment, then the advantage Postlab offers is version tracking and the ability for multiple users to open a library file (only one with write access).

Long distance

The second scenario is working with other editors outside of your facility. The first step is to get the media to the outside editor. You could certainly send a drive, but that isn’t efficient in time nor cost, especially across continents. If you only need creative editing and not finishing services, then low-res, proxy files are fine. So I converted my 4K UHD ProRes HQ files to 960 x 540 H264 (3Mbps) files and used Frame.io to transfer them over the internet. The key to proper relinking when you are done is to set audio to pass-through when converting this files. This was a double-system sound shoot, so I uploaded both the H264 videos files and the sound recordist’s WAV files to Frame and then downloaded them again at the other end (my home). Now I had media in both locations. The process would be the same even if it were two editors in two different countries.

The first Postlab step is to create and upload this FCPX library. Once that has been established, any authorized user with a Postlab log-in can access the production. I decided to go back and forth on this production between my home and the facility and also using different user log-ins – thus simulating a team of remote editors. Each time I did this, version changes were tracked by Postlab. If I were working with multiple editors, I would have been able to see what tasks each had performed.

It’s important to note that when you collaborate in this way, each editor should be using the same effects, LUTs, and Motion templates, otherwise some things will appear offline. Since the path to the media was different at home versus at the facility, each time I went between the two, checking in and then checking out the production, media files would appear offline. A simple relink fixed this, but it’s something to be aware of. Once totally done, I could relink to the high-res camera files and “finish” the project back at the office.


When you upload a library back to Postlab, that open FCPX library is closed within Final Cut Pro X on your system, because you have checked it back in. Once you log out of Postlab, the temporary library file is moved to the trash. If you need a local version of the library, then export it from the Postlab app.

Once you get the hang of it, collaboration is simple using Postlab Cloud. Library files stay light without any of the sort of corruption caused by using services like DropBox. My test project included synchronized multi-cam clips and multi-channel audio. Each time during this exchange clips, projects, and edits showed up as expected when going between the various users. Whether or not Apple ever tackles collaboration within Final Cut Pro X is an unknown. But why wait? If you need that today, then Postlab Cloud offers a solid answer.

The relaunched Postlab Cloud includes three plans, which are priced per user/per year: Postlab, Postlab Pro, and Postlab Server. The first tier only allows for library version tracking and sharing. Pro allows for a lot more libraries to be shared and comes with more features. Server is a dedicated Postlab Cloud server for larger teams or those that require IT-specific features like Active Directory. Finally, Hedge/Postlab plans to ship a local version of Postlab – designed for use within local networks – soon after launch.

Postlab has now expanded to include Premiere Pro users.

Check out the Postlab tutorials for more information.

The article was originally written for FCP.co.

©2020 Oliver Peters

ADA Compliance

The Americans with Disabilities Act (ADA) has enriched the lives of many in the disabled community since its introduction in 1990. It affects all of our lives, from wheelchair-friendly ramps on street corners and business entrances to the various accessibility modes in our computers and smart devices. While many editors don’t have to deal directly with the impact of the ADA on media, the law does affect broadcasters and streaming platforms. If you deliver commercials and programs, then your production will be affected in one way or another. Typically the producer is not directly subject to compliance, but the platform is. This means someone has to provide the elements that complete compliance as part of any distribution arrangement, whether it is the producer or the outlet itself.

Two components are involved to meet proper ADA compliance: closed captions and described audio (aka audio descriptions). Captions come in two flavors – open and closed. Open captions or subtitles consists of text “burned” into the image. It is customarily used when a foreign language is spoken in an otherwise English program (or the equivalent in non-English-speaking countries). Closed captions are enclosed in a data stream that can be turned on and off by the viewer, device, or the platform and are intended to make the dialogue accessible to the hearing-impaired. Closed captions are often also turned on in noisy environments, like a TV playing in a gym or a bar.

Audio descriptions are intended to aid the visually-impaired. This is a version of the audio mix with an additional voice-over element. An announcer describes visual information that is not readily obvious from the audio of the program itself. This voice-over fills in the gaps, such as “man climbs to the top of a large hill” or “logos appear on screen.”

Closed captions

Historically post houses and producers have opted to outsource caption creation to companies that specialize in those services. However, modern NLEs enable any editor to handle captions themselves and the increasing enforcement of ADA compliance is now adding to the deliverable requirements for many editors. With this increased demand, using a specialist may become cost prohibitive; therefore, built-in tools are all the more attractive.

There are numerous closed caption standards and various captioning file formats. The most common are .scc (Scenarist), .srt (SubRip), and .vtt (preferred for the web). Captions can be supplied as “embedded” (secondary data within the master file) or as a separate “sidecar” file, which is intended to play in sync with the video file. Not all of these are equal. For example, .scc files (embedded or as sidecar files) support text formatting and positioning, while .srt and .vtt do not. For example, if you have a lower-third name graphic come on screen, you want to move any caption from its usual lower-third, safe-title position to the top of the screen while that name graphic is visible. This way both remain legible. The .scc format supports that, but the other two don’t. The visual appearance of the caption text is a function of the playback hardware or software, so the same captions look different in QuickTime Player versus Switch or VLC. In addition, SubRip (.srt) captions all appear at the bottom, even if you repositioned them to the top, while .vtt captions appear at the top of the screen.

You may prefer to first create a transcription of the dialogue using an outside service, rather than simply typing in the captions from scratch. There are several online resources that automate speech-to-text, including SpeedScriber, Simon Says, Transcriptive, and others. Since AI-based transcription is only as good as the intelligibility of the audio and dialects of the speakers, they all require further text editing/correction through on online tool before they are ready to use.

One service that I’ve used with good results is REV.com, which uses human transcribers for greater accuracy, as well as offering on online text editing tool. The transcription can be downloaded in various formats, including simple text (.txt). Once you have a valid transcription, that file can be converted through a variety of software applications into .srt, .scc, or .vtt files. These in turn can be imported into your preferred NLE for timing, formatting, and positioning adjustments.

Getting the right look

There are guidelines that captioning specialists follow, but some are merely customary and do not affect compliance. For example, upper and lower case text is currently the norm, but you’ll still be OK if your text is all caps. There are also accepted norms when English (or other) subtitles appear on screen, such as for someone speaking in a foreign language. In those cases, no additional closed caption text is used, since the subtitle already provides that information. However, a caption may appear at the top of the screen identifying that a foreign language is being spoken. Likewise, during sections with only music or ambient sounds, a caption may briefly identifying it as such.

When creating captions, you have to understand that readability is key, so the text will not always run perfectly in sync with the dialogue. For instance, when two actors engage in rapid fire dialogue, each caption may stay on longer than the spoken line. You can adjust the timing against that scene so that they eventually catch up once the pace slows down. It’s good to watch a few captioned programs before starting from scratch – just to get a sense of what works and what doesn’t.

If you are creating captions for a program to run on a specific broadcast network or streaming services, then it’s a good idea to find out of they provide a style guide for captions.

Using your NLE to create closed captions

Avid Media Composer, Adobe Premiere Pro, DaVinci Resolve, and Apple Final Cut Pro X all support closed captions. I find FCPX to be the best of this group, because of its extensive editing control over captions and ease of use. This includes text formatting, but also display methods, like pop-on, paint-on, and roll-up effects. Import .scc files for maximum control or extract captions from an existing master, if your media already has embedded caption data. The other three NLEs place the captions onto a single data track (like a video track) within which captions can be edited. Final Cut Pro X places them as a series of connected clips, like any other video clip or graphic. If you perform additional editing, the FCPX magnetic timeline takes care of keeping the captions in sync with the associated dialogue.

Final Cut’s big plus for me is that validation errors are flagged in red. Validation errors occur when caption clips overlap, may be too short for the display method (like a paint-on), are too close to the start of the file, or other errors. It’s easy to find and fix these before exporting the master file.


NLEs support the export of a master file with embedded captions, or “burned” into the video as a subtitle, or the captions exported as a separate sidecar file. Specific format support for embedded captions varies among applications. For example, Premiere Pro – as well as Adobe Media Encoder – will only embed captioning data when you export your sequence or encode a file as a QuickTime-wrapped master file. (I’m running macOS, so there may be other options with Windows.)

On the other hand, Apple Compressor and Final Cut Pro X can encode or export files with embedded captions for formats such as MPEG2 TS, MPEG 2 PS, or MP4. It would be nice if all these NLEs supported the same range of formats, but they don’t. If your goal is a sidecar caption file instead of embedded data, then it’s a far simpler and more reliable process.

Audio descriptions

Compared to closed captions, providing audio description files is relatively easy. These can either be separate audio files – used as sidecar files for secondary audio – or additional tracks on the delivery master. Sometimes it’s a completely separate video file with only this version of the mix. Advanced platforms like Netflix may also require an IMF (Interoperable Master Format) package, which would include an audio description track as part of that package. When audio sidecar files are requested for the web or certain playback platforms, like hotel TV systems, the common deliverable formats are .mp3 or .m4a. The key is that the audio track should be able to run in sync with the rest of the program.

Producing an audio description file doesn’t require any new skills. A voice-over announcer is describing any action that occurs on screen, but which wouldn’t otherwise make sense if you were only listening to audio without that. Think of it like a radio play or podcast version of your TV program. This can be as simple as fitting additional VO into the gaps between actor/host/speaker dialogue. If you have access to the original files (such as a Pro Tools session) or dialogue/music/effects stems, then you have some latitude to adjust audio elements in order to fit in the additional voice-over lines. For example, sometimes the off-camera dialogue may be moved or edited in order to make more space for the VO descriptions. However, on-camera/sync dialogue is left untouched. In that case, some of this audio may be muted or ducked to make space for even longer descriptions.

Some of the same captioning service providers also provide audio description services, using their pool of announcers. Yet, there’s nothing about the process that any producer or editor couldn’t handle themselves. For example, scripting the extra lines, hiring and directing talent, and producing the final mix only require a bit more time added to the schedule, yet permits the most creative control.

ADA compliance has been around since 1990, but hasn’t been widely enforced outside of broadcast. That’s changing and there are no more excuses with the new NLE tools. It’s become easier than ever for any editor or producer to make sure they can provide the proper elements to touch every potential viewer.

For additional information, consult the FCC guidelines on closed captions.

The article was originally written for Pro Video Coalition.

©2020 Oliver Peters

Video Technology 2020 – Shared Storage

Shared storage used to be the domain of “heavy iron” facilities with Avid, Facilis, and earlier Apple Xserve systems providing the horsepower. Thanks to advances in networking and Ethernet technology, shared storage is accessible to any user. Whether built-in or via adapters, modern computers can tap into 1Gbps, 10Gbps, and even higher, networking speeds. Most computers can natively access Gigabit Ethernet networks (1Gbps) – adequate for SD and HD workflows. Computers designed for the pro video market increasingly sport built-in 10GbE ports, enabling comfortable collaboration with 4K media and up. Some of today’s most popular shared storage vendors include QNAP, Synology, and LumaForge.

This technology will become more prolific in 2020, with systems easier to connect and administer, making shared storage as plug-and-play as any local drives. Network Attached Storage (NAS) systems can service a single workstation or multiple users. In fact, companies like QNAP even offer consumer versions of these products designed to operate as home media servers. Even LumaForge sells a version of its popular Jellyfish through the online Apple Store. A simple, on-line connection guide will get you up and running, no IT department required. This is ideal for the individual editor or small post shop.

Expect 2020 to see higher connection speeds, such as 40GbE, and NAS proliferation even more widespread. It’s not just a matter of growth. These vendors are also interested in extending the functionality of their products beyond being a simple bucket for media. NAS systems will become full-featured media hubs. For example, if you an Avid user, you are familiar with their Media Central concept. In essence, this means the shared storage solution is a platform for various other applications, including the editing software. There are additional media applications that include management apps for user permission control, media queries, and more. Like Avid, the other vendors are exploring similar extensibility through third-party apps, such as Axle Video, Kyno, Hedge, Frame.io, and others. As such, a shared network becomes the whole that is greater than the sum of its parts.

Along with increased functionality, expect changes in the hardware, too. Modern NAS hardware is largely based on RAID arrays with spinning mechanical drives. As solid state (SSD) storage devices become more affordable, many NAS vendors will offer some of their products featuring RAID arrays configured with SSDs or even NVMe systems. Or a mixture of the two, with the SSD-based units used for short-term projects or cache files. Eventually the cost will come down enough so that large storage volumes can be cost-effectively populated with only SSDs. Don’t expect to be purchasing 100TB of SSD storage at a reasonable price in 2020; however, that is the direction in which we are headed. At least in this coming year, mechanical drives will still rule. Nevertheless, start looking at some percentage of your storage inventory to soon be based on SSDs.

Click here for more on shared storage solutions.

Originally written for Creative Planet Network.

©2020 Oliver Peters