Video is a visual medium, but the audio side of a project is as important – often more important – than the picture side. When story context is based on dialogue, then the story will make no sense if you can’t hear or understand that spoken information. In theatrical mixes, it’s common for a three person team of rerecording mixers to operate the console for the final mix. Their responsibilities are divided into dialogue, sound effects, and music. The dialogue mixer is usually the team lead, precisely because intelligible dialogue is paramount to a successful motion picture mix. For this reason, dialogue is also mixed as primarily mono coming from the center speaker in a 5.1 surround set-up.
A lot of my work includes documentary-style entertainment and corporate projects, which frequently lean on recorded interviews to tell the story. In many cases, sending the mix outside isn’t in the budget, which means that mix falls to me. You can mix in a DAW or in your NLE. Many video editors are intimidated by or unfamiliar with ProTools or Logic Pro X – or even the Fairlight page in DaVinci Resolve. Rest assured that every modern NLE is capable of turning out an excellent stereo mix for the purposes of TV, web, or mobile viewing. Given the right monitoring and acoustic environment, you can also turn out solid LCR or 5.1 surround mixes, adequate for TV viewing.
Original location recording
You typically have no control over the original sound recording. On many projects, the production team will have recorded double-system sound controlled by a separate location mixer (recordist). They generally use two microphones on the subject – a lav and an overhead shotgun/boom mic.
The lav will often be tucked under clothing to filter out ambient noise from the surrounding environment and to hide it from the camera. This will sound closer, but may also sound a bit muffled. There may also be occasional clothes rustle from the clothing rubbing against the mic as the speaker moves around. For these reasons I will generally select the shotgun as the microphone track to use. The speaker’s voice will sound better and the recording will tend to “breathe.” The downside is that you’ll also pick up more ambient noise, such as HVAC fans running in the background. Under the best of circumstances these will be present during quiet moments, but not too noticeable when the speaker is actually talking.
The first stage of any dialogue processing chain or workflow is noise reduction and gain correction. At the start of the project you have the opportunity to clean up any raw voice tracks. This is ideal, because it saves you from having to do that step later. In the double-system sound example, you have the ability to work with the isolated .wav file before syncing it within a multicam group or as a synchronized clip.
Most NLEs feature some audio noise reduction tools and you can certainly augment these with third party filters and standalone apps, like those from iZotope. However, this is generally a process I will handle in Adobe Audition, which can process single tracks, as well as multitrack sessions. Audition starts with a short noise print (select a short quiet section in the track) used as a reference for the sounds to be suppressed. Apply the processing and adjust settings if the dialogue starts sounding like the speaker is underwater. Leaving some background noise is preferable to over-processing the track.
Once the noise reduction is where you like it, apply gain correction. Audition features an automatic loudness match feature or you can manually adjust levels. The key is to get the overall track as loud as you can without clipping the loudest sections and without creating a compressed sound. You may wish to experiment with the order of these processes. For example, you may get better results adjusting gain first and then applying the noise reduction afterwards.
After both of these steps have been completed, bounce out (export) the track to create a new, processed copy of the original. Bring that into your NLE and combine it with the picture. From here on, anytime you cut to that clip, you will be using the synced, processed audio.
If you can’t go through such a pre-processing step in Audition or another DAW, then the noise reduction and correction must be handled within your NLE. Each of the top NLEs includes built-in noise reduction tools, but there are plenty of plug-in offerings from Waves, iZotope, Accusonus, and Crumplepop to name a few. In my opinion, such processing should be applied on the track (or audio role in FCPX) and not on the clip itself. However, raising or lowering the gain/volume of clips should be performed on the clip or in the clip mixer (Premiere Pro) first.
Track/audio role organization
Proper organization is key to an efficient mix. When a speaker is recorded multiple times or at different locations, then the quality or tone of those recordings will vary. Each situation may need to be adjusted differently in the final mix. You may also have several speakers interviewed at the same time in the same location. In that case, the same adjustments should work for all. Or maybe you only need to separate male from female speakers, based on voice characteristics.
In a track-based NLE like Media Composer, Resolve, Premiere Pro, or others, simply place each speaker onto a separate track so that effects processing can be specific for that speaker for the length of the program. In some cases, you will be able to group all of the speaker clips onto one or a few tracks. The point is to arrange VO, sync dialogue, sound effects, and music together as groups of tracks. Don’t intermingle voice, effects, or music clips onto the same tracks.
Once you have organized your clips in this manner, then you are ready for the final mix. Unfortunately this organization requires some extra steps in Final Cut Pro X, because it has no tracks. Audio clips in FCPX must be assigned specific audio roles, based on audio types, speaker names, or any other criteria. Such assignments should be applied immediately upon importing a clip. With proper audio role designations, the process can work quite smoothly. Without it, you are in a world of hurt.
Since FCPX has no traditional track mixer, the closest equivalent is to apply effects to audio lanes based on the assigned audio roles. For example, all clips designated as dialogue will have their audio grouped together into the dialogue lane. Your sequence (or just the audio) must first be compounded before you are able to apply effects to entire audio lanes. This effectively applies these same effects to all clips of a given audio role assignment. So think of audio lanes as the FCPX equivalent to audio tracks in Premiere, Media Composer, or Resolve.
The vocal chain
The objective is to get your dialogue tracks to sound consistent and stand out in the mix. To do this, I typically use a standard set of filter effects. Noise reduction processing is applied either through preprocessing (described above) or as the first plug-in filter applied to the track. After that, I will typically apply a de-esser and a plosive remover. The first reduces the sibilance of the spoken letter “s” and the latter reduces mic pops from the spoken letter “p.” As with all plug-ins, don’t get heavy-handed with the effect, because you want to maintain a natural sound.
You will want the audio – especially interviews – to have a consistent level throughout. This can be done manually by adjusting clip gain, either clip by clip, or by rubber banding volume levels within clips. You can also apply a track effect, like an automatic volume filter (Waves, Accusonus, Crumplepop, other). In some cases a compressor can do the trick. I like the various built-in plug-ins offered within Premiere and FCPX, but there are a ton of third-party options. I may also apply two compression effects – one to lightly level the volume changes, and the second to compress/limit the loudest peaks. Again, the key is to apply light adjustments, because I will also compress/limit the master output in addition to these track effects.
The last step is equalization. A parametric EQ is usually the best choice. The objective is to assure vocal clarity by accentuating certain frequencies. This will vary based on the sound quality of each speaker’s voice. This is why you often separate speakers onto their own tracks according to location, voice characteristics, and so on. In actual practice, only two to three tracks are usually needed for dialogue. For example, interviews may be consistent, but the voice-over recordings require a different touch.
Don’t get locked into the specific order of these effects. What I have presented in this post isn’t necessarily gospel for the hierarchical order in which to use them. For example, EQ and level adjusting filters might sound best when placed at different positions in this stack. A certain order might be better for one show, whereas a different order may be best the next time. Experiment and listen to get the best results!
©2020 Oliver Peters