Improving your mix with iZotope

In classic analog mixing consoles like Neve or SSL, each fader includes a channel strip. This is a series of in-line processors that can be applied to each individual input and usually consists of some combination of an EQ, gate, and compressor. If a studio mixing engineer doesn’t use the built-in effects, then they may have a rack of outboard effects units that can be patched in and out of the mixing console. iZotope offers a number of processing products that are the software equivalent of the channel strip or effects rack.

I’ve written about iZotope products in the past, so I decided to take a look at their Mix & Master Bundle Plus, with is a collection of three of their top products – Neutron 3, Nectar 3, and Ozone 9. These products, along with RX, are typically what would be of interest to most video editors or audio post mixers. RX 8 is a bundle of repair effects, such as noise reduction, click repair, and so on.

Depending on the product, it may be available within a single plug-in effect, or several plug-ins, or both a plug-in and a standalone application. For instance, RX8 and Ozone 9 can be used within a DAW or an NLE, in addition to being a separate application. Most of the comprehensive iZotope products are available in three versions – Elements (a “lite” version), Standard, and Advanced. As the name implies, you get more features with the Advanced version; however, nearly everything an editor would want can be handled in the Standard product or for some, in an Elements version.

The mothership

Each of these products is an AU, VST, and/or AAX plug-in compatible with most DAWs and NLEs. It shows up as a single plug-in effect, which in iZotope’s parlance is the mothership for processing modules. Each product features its own variety of processing modules, such as EQ or compression. These modules can be stacked and arranged in any order within the mothership plug-in. Instead of having three individual effects applied to a track, you would only have one iZotope plug-in, which in turn contains the processing modules that you’d like to use. While each product might offer a similar module, like EQ, these modules do not function in exactly the same way from one product to the next. The range of control or type of function will differ. For example, only Ozone 9 includes mid/side EQ. In addition to new features, this newest series of iZotope updates includes faster processing with real-time performance and some machine learning functions.

If you can only buy one of these products and they perform somewhat similar tasks, how do you know what to use? First, there’s nothing to prevent you from applying Ozone, Nectar, or Neutron interchangeably to any individual track or a master bus. Or to a voice-over or a music mix. From the standpoint of a video editor using these plug-ins for the audio mix of my videos, I would simplify it down this way. Nectar 3 is designed for vocal processing. Neutron 3 is designed for music. Ozone 9 is designed for mastering. If I own all three, then in a simple mix of a dialogue track against music, I would apply Nectar 3 to the dialogue track, Neutron 3 to the music track, and Ozone 9 to the master bus.

Working with iZotope’s processing

Neutron, Nectar, and Ozone each include a wealth of presets that configure a series of modules depending on the style you want – from subtle to aggressive. You can add or remove modules or rearrange their order in the chain by dragging a module left or right within the plug-in’s interface. Or start from a blank shell and build an effects chain from the module selection available within that iZotope product. Neutron offers six basic modules, Nectar nine, and Ozone eleven. Many audiophiles love vintage processing to warm up the sound. In spite of iZotope’s sleek, modern approach, you’re covered here, too. Ozone 9 includes several dedicated vintage modules for tape saturation, limiting, EQ, and compression.

All three standard versions of these products include an Assistant function. If you opt to use the Assistant, then play your track and Nectar, Neutron, or Ozone will automatically calculate and apply the modules and settings needed, based on the parameters that you choose and the detected audio from the mix or track. You can then decide to accept or reject the recommendation. If you accept, then use that as a starting point and make adjustments to the settings or add/delete modules to customize the mix.

Neutron 3 Advanced includes Mix Assistant, an automated mix that uses machine learning. Let’s say you have a song mix with stems for vocals, bass, drums, guitars, and synths. Apply the Relay effect to each track and then iZotope’s Visual Mixer to the master bus. With the Standard version, you can use the Visual Mixer to control the levels, panning, and stereo width for each track from a single interface. The Relay plug-ins control those settings on each track based on what you’ve done using the Visual Mixer controls. If you have Neutron 3 Advanced, then this is augmented by Mix Assistant. Play the song through and let Mix Assistant set a relative balance based on your designated focus tracks. In other words, you can tell the algorithm whether vocals or guitars should be the focus and thereby dominant in the mix.

Note that iZotope regularly updates versions with new features, which may or may not be needed in your particular workflow. As an example, RX8 was just released with new features over RX7. But if you owned an earlier version, then it might still do everything you need. While new features are always welcome, don’t feel any pressure that you have to update. Just rest assured that iZotope is continually taking customer feedback and developing its products.

Be sure to check out iZotope’s wealth of tutorials and learning materials, including their “Are you listening?” YouTube series. Even if you don’t use any iZotope products, Grammy-nominated mastering engineer Jonathan Wyner offers plenty of great tips for getting the best out of your mixes.

©2020 Oliver Peters

Soundtheory Gullfoss Intelligent EQ

There are zillions of audio plug-ins on the market to enhance your DAW or NLE. In most cases, the operation and user interface design is based on familiar physical processing hardware. Often the user interface design is intentionally skeuomorphic as either a direct analog to the physical version or as a prompt to give you a clue about its processed sound and control functions.

When you first open the Gullfoss equalizer plug-in, you might think it works like many other EQ plug-ins. Grab a frequency point on the graph line, pull it up or down, and spread out or tighten the Q value. But you would be totally wrong. In fact, this is a plug-in that absolutely requires you to read the manual. Check out the tutorial videos on the Soundtheory site and its operation will make sense to you.

Soundtheory launched Gullfoss (which gets its name from the Gullfoss waterfall in Iceland) as its first commercial product after years of research into perceived loudness. According to Soundtheory, Gullfoss is not using artificial intelligence or other machine learning algorithms. Instead, it employs their computational auditory perception technology. More on that in a moment.

Gullfoss installs as an AU, VST, and AAX plug-in, so it’s compatible with a wide range of DAWs and NLEs. License management is handled via iLok – something most Pro Tools users are very familiar with. If you don’t own a physical iLok USB key (dongle), then license management is handled through the iLok License Manager application. You would install this with a free iLok account onto your computer. iLok management allows you to move the plug-in authorization between computers.

The Gullfoss equalization technology is based on balancing dominant and dominated frequencies. The plug-in automatically determines what it considers dominant and dominated frequencies and dynamically updates its processing 300 times per second. User control is via the Recover and Tame controls.

Increasing the Recover value accentuates dominated frequencies while Tame adjusts the emphasis of dominant frequencies in the mix. Bias controls the balance between Recover and Tame. A positive value shifts more of the processing based on the Recover frequencies, whereas a negative value shifts the emphasis towards Tame. Brighten tells the Recover/Tame mechanism to prefer lower or higher frequencies. Boost balances low versus mid frequencies. Positive values favor bass and negative Boost values decrease bass and increase mids. Finally, there’s an overall gain control and, of course, Bypass.

By default, you are applying Gullfoss processing to the complete sound spectrum of a track. There are left and right range boundaries that you can slide inwards. This restricts the frequencies being analyzed and processed to the area between the two boundary lines. For instance, you can use this with a tight range to make Gullfoss function like a de-esser. If you invert the range by sliding the left or right lines past each other, then the processing occurs outside of that range.

One tip Soundtheory offers as a beginning point is to set the Recover and Tame controls each to 50. Then adjust Bias and Brightness so that the small meters to the left and bottom of the graph hover around their zero mark. This provides a good starting point and then adjust more as needed. Quite frankly it requires a bit of experimentation as to how best to use it. Naturally, whether or not you like the result depends on your own taste. In general, this EQ probably appeals more to music mixers and less to video editors or audio post engineers. I found that it worked nicely as a mastering EQ at the end of a mix chain or applied to a completed, mixed track.

I’m a video editor and not a music mixer, so I also tested files from a corporate production, consisting of a dialogue and a music stem. I ran two tests – once to the fully mixed and exported track and then also at the mix with the two stems isolated. I found that the processing sounded best when I kept the stems separate and applied Gullfoss to the master bus. Of course, this isn’t the best scenario, because the voices and music cues would change within each stem. However, with a bit of experimentation I found a setting that worked overall. It did result in a mix that sounded clearer and more open. Under a proper mix scenario, each voice and each music cue would be on separate tracks for individual adjustments prior to hitting the Gullfoss processing.

In regards to music mixes, it sounded best to me with tracks that weren’t extremely dense. For example, acoustic-style songs with vocals, acoustic guitars, or woodwind-based tracks seemed to benefit the most from Gullfoss. When it works well, the processing really opens up the track – almost like removing a layer of mushiness from the sound. When it was less effective, the results weren’t bad – just more in the take-it-or-leave-it category. The Soundtheory home page features several before and after examples. As a video editor, I did find that it had value when applied to a music track that I might use in a mix with voice-over. However, for voice control, I would stick with a traditional EQ plug-in. If I need de-essing, then I would use a traditional, dedicated de-esser.

Gullfoss is a nice tool to have in the toolkit for music and mastering mixers, even though it wouldn’t be the only EQ you’d ever use. However, it can be that sparkle that brings a song up a notch. Some mixers have commented that Gullfoss saved them a ton of time versus sculpting a sound with standard EQs. When it’s at its most effective, Gullfoss processing adds that “glue” that mixers want for a music track or song.

©2020 Oliver Peters

Dialogue Mixing Tips

 

Video is a visual medium, but the audio side of a project is as important – often more important – than the picture side. When story context is based on dialogue, then the story will make no sense if you can’t hear or understand that spoken information. In theatrical mixes, it’s common for a three person team of rerecording mixers to operate the console for the final mix. Their responsibilities are divided into dialogue, sound effects, and music. The dialogue mixer is usually the team lead, precisely because intelligible dialogue is paramount to a successful motion picture mix. For this reason, dialogue is also mixed as primarily mono coming from the center speaker in a 5.1 surround set-up.

A lot of my work includes documentary-style entertainment and corporate projects, which frequently lean on recorded interviews to tell the story. In many cases, sending the mix outside isn’t in the budget, which means that mix falls to me. You can mix in a DAW or in your NLE. Many video editors are intimidated by or unfamiliar with ProTools or Logic Pro X – or even the Fairlight page in DaVinci Resolve. Rest assured that every modern NLE is capable of turning out an excellent stereo mix for the purposes of TV, web, or mobile viewing. Given the right monitoring and acoustic environment, you can also turn out solid LCR or 5.1 surround mixes, adequate for TV viewing.

I have covered audio and mix tips in the past, especially when dealing with Premiere. The following are a few more pointers.

Original location recording

You typically have no control over the original sound recording. On many projects, the production team will have recorded double-system sound controlled by a separate location mixer (recordist). They generally use two microphones on the subject – a lav and an overhead shotgun/boom mic.

The lav will often be tucked under clothing to filter out ambient noise from the surrounding environment and to hide it from the camera. This will sound closer, but may also sound a bit muffled. There may also be occasional clothes rustle from the clothing rubbing against the mic as the speaker moves around. For these reasons I will generally select the shotgun as the microphone track to use. The speaker’s voice will sound better and the recording will tend to “breathe.” The downside is that you’ll also pick up more ambient noise, such as HVAC fans running in the background. Under the best of circumstances these will be present during quiet moments, but not too noticeable when the speaker is actually talking.

Processing

The first stage of any dialogue processing chain or workflow is noise reduction and gain correction. At the start of the project you have the opportunity to clean up any raw voice tracks. This is ideal, because it saves you from having to do that step later. In the double-system sound example, you have the ability to work with the isolated .wav file before syncing it within a multicam group or as a synchronized clip.

Most NLEs feature some audio noise reduction tools and you can certainly augment these with third party filters and standalone apps, like those from iZotope. However, this is generally a process I will handle in Adobe Audition, which can process single tracks, as well as multitrack sessions. Audition starts with a short noise print (select a short quiet section in the track) used as a reference for the sounds to be suppressed. Apply the processing and adjust settings if the dialogue starts sounding like the speaker is underwater. Leaving some background noise is preferable to over-processing the track.

Once the noise reduction is where you like it, apply gain correction. Audition features an automatic loudness match feature or you can manually adjust levels. The key is to get the overall track as loud as you can without clipping the loudest sections and without creating a compressed sound. You may wish to experiment with the order of these processes. For example, you may get better results adjusting gain first and then applying the noise reduction afterwards.

After both of these steps have been completed, bounce out (export) the track to create a new, processed copy of the original. Bring that into your NLE and combine it with the picture. From here on, anytime you cut to that clip, you will be using the synced, processed audio.

If you can’t go through such a pre-processing step in Audition or another DAW, then the noise reduction and correction must be handled within your NLE. Each of the top NLEs includes built-in noise reduction tools, but there are plenty of plug-in offerings from Waves, iZotope, Accusonus, and Crumplepop to name a few. In my opinion, such processing should be applied on the track (or audio role in FCPX) and not on the clip itself. However, raising or lowering the gain/volume of clips should be performed on the clip or in the clip mixer (Premiere Pro) first.

Track/audio role organization

Proper organization is key to an efficient mix. When a speaker is recorded multiple times or at different locations, then the quality or tone of those recordings will vary. Each situation may need to be adjusted differently in the final mix. You may also have several speakers interviewed at the same time in the same location. In that case, the same adjustments should work for all. Or maybe you only need to separate male from female speakers, based on voice characteristics.

In a track-based NLE like Media Composer, Resolve, Premiere Pro, or others, simply place each speaker onto a separate track so that effects processing can be specific for that speaker for the length of the program. In some cases, you will be able to group all of the speaker clips onto one or a few tracks. The point is to arrange VO, sync dialogue, sound effects, and music together as groups of tracks. Don’t intermingle voice, effects, or music clips onto the same tracks.

Once you have organized your clips in this manner, then you are ready for the final mix. Unfortunately this organization requires some extra steps in Final Cut Pro X, because it has no tracks. Audio clips in FCPX must be assigned specific audio roles, based on audio types, speaker names, or any other criteria. Such assignments should be applied immediately upon importing a clip. With proper audio role designations, the process can work quite smoothly. Without it, you are in a world of hurt.

Since FCPX has no traditional track mixer, the closest equivalent is to apply effects to audio lanes based on the assigned audio roles. For example, all clips designated as dialogue will have their audio grouped together into the dialogue lane. Your sequence (or just the audio) must first be compounded before you are able to apply effects to entire audio lanes. This effectively applies these same effects to all clips of a given audio role assignment. So think of audio lanes as the FCPX equivalent to audio tracks in Premiere, Media Composer, or Resolve.

The vocal chain

The objective is to get your dialogue tracks to sound consistent and stand out in the mix. To do this, I typically use a standard set of filter effects. Noise reduction processing is applied either through preprocessing (described above) or as the first plug-in filter applied to the track. After that, I will typically apply a de-esser and a plosive remover. The first reduces the sibilance of the spoken letter “s” and the latter reduces mic pops from the spoken letter “p.” As with all plug-ins, don’t get heavy-handed with the effect, because you want to maintain a natural sound.

You will want the audio – especially interviews – to have a consistent level throughout. This can be done manually by adjusting clip gain, either clip by clip, or by rubber banding volume levels within clips. You can also apply a track effect, like an automatic volume filter (Waves, Accusonus, Crumplepop, other). In some cases a compressor can do the trick. I like the various built-in plug-ins offered within Premiere and FCPX, but there are a ton of third-party options. I may also apply two compression effects – one to lightly level the volume changes, and the second to compress/limit the loudest peaks. Again, the key is to apply light adjustments, because I will also compress/limit the master output in addition to these track effects.

The last step is equalization. A parametric EQ is usually the best choice. The objective is to assure vocal clarity by accentuating certain frequencies. This will vary based on the sound quality of each speaker’s voice. This is why you often separate speakers onto their own tracks according to location, voice characteristics, and so on. In actual practice, only two to three tracks are usually needed for dialogue. For example, interviews may be consistent, but the voice-over recordings require a different touch.

Don’t get locked into the specific order of these effects. What I have presented in this post isn’t necessarily gospel for the hierarchical order in which to use them. For example, EQ and level adjusting filters might sound best when placed at different positions in this stack. A certain order might be better for one show, whereas a different order may be best the next time. Experiment and listen to get the best results!

©2020 Oliver Peters

SOUND FORGE Pro Revisited

I’ve reviewed SOUND FORGE a number of times over the years, most recently in 2017. Since its initial development, it has migrated from Sonic Foundry to Sony Creative Software and most recently Magix, a German software developer. Magix’s other products are PC-centric, but SOUND FORGE comes in both Mac and Windows versions.

The updated 3.0 version of SOUND FORGE Pro for the Mac was released in 2017. Although no 4.0 version has been released in the interim, 3.0 was developed as a 64-bit app. Current downloads are, of course, an updated build. Across the product line, there are several versions and bundles, including “lite” SOUND FORGE versions. However, Mac users can only choose between SOUND FORGE Pro Mac 3 or Audio Master Suite Mac. Both include SOUND FORGE Pro Mac, iZotope RX Elements, and iZotope Ozone Elements. The Audio Master Suite Mac adds the Steinberg SpectraLayers Pro 4 analysis/repair application. It’s not listed, but the download also includes the Convrt application, which is an MP3 batch conversion utility.

SOUND FORGE Pro is designed as a dedicated audio mastering application, that does precision audio editing. You can record, edit, and process multichannel audio files (up to 32 tracks) in maximum bit rates of 24-bit, 32-bit, and 64-bit float at up to 192kHz. In addition to the iZotope Elements packages, SOUND FORGE Pro comes with a variety of its own AU plug-ins. Any other AU and VST plug-ins already installed on your system will also show up and work within the application.

Even though SOUND FORGE Pro is essentially a single file editor (as compared with a multi-track DAW, like Pro Tools), you can work with multiple individual files. Multiple files are displayed within the interface as horizontal tabs or in a vertical stack. You can process multiple files at the same time and can copy and paste between them. You can also copy and paste between individual channels within a single multichannel file.

As an audio editor, it’s fast, tactile, and non-destructive, making it ideal for music editing, podcasts, radio interviews, and more. For audio producers, it complies with Red Book Standard CD authoring. The attraction for video editors is its mastering tools, especially loudness control for broadcast compliance. Both Magix’s Wave Hammer and iZotope Ozone Elements’ mastering tools are great for solving loudness issues. That’s aided by accurate LUFS metering. Other cool tools include AutoTrim, which automatically removes gaps of silence at the beginnings and ends of files or from regions within a file.

There is also élastique Timestretch, a processing tool to slow down or speed up audio, while maintaining the correct pitch. Timestretch can be applied to an entire file or simply a section within a file. Effects tools and plug-ins are divided into groups that require processing or those that can be played in real-time. For example, Timestretch is applied as a processing step, whereas a reverb filter would play in real time. Processing is typically fast on any modern desktop or laptop computer, thanks to the application’s 64-bit engine.

Basic editing is as simple as marking a section and hitting the delete key. You can also split a file into events and then trim, delete, move, or copy & paste event blocks. If you slide an event to overlap another, a crossfade is automatically created. You can adjust the fade-in/fade-out slopes of these crossfades.

Even if you already have Logic Pro X, Audition, or Pro Tools installed, SOUND FORGE Pro Mac may still be worth the investment for its simplicity and mastering focus.

©2020 Oliver Peters

ADA Compliance

The Americans with Disabilities Act (ADA) has enriched the lives of many in the disabled community since its introduction in 1990. It affects all of our lives, from wheelchair-friendly ramps on street corners and business entrances to the various accessibility modes in our computers and smart devices. While many editors don’t have to deal directly with the impact of the ADA on media, the law does affect broadcasters and streaming platforms. If you deliver commercials and programs, then your production will be affected in one way or another. Typically the producer is not directly subject to compliance, but the platform is. This means someone has to provide the elements that complete compliance as part of any distribution arrangement, whether it is the producer or the outlet itself.

Two components are involved to meet proper ADA compliance: closed captions and described audio (aka audio descriptions). Captions come in two flavors – open and closed. Open captions or subtitles consists of text “burned” into the image. It is customarily used when a foreign language is spoken in an otherwise English program (or the equivalent in non-English-speaking countries). Closed captions are enclosed in a data stream that can be turned on and off by the viewer, device, or the platform and are intended to make the dialogue accessible to the hearing-impaired. Closed captions are often also turned on in noisy environments, like a TV playing in a gym or a bar.

Audio descriptions are intended to aid the visually-impaired. This is a version of the audio mix with an additional voice-over element. An announcer describes visual information that is not readily obvious from the audio of the program itself. This voice-over fills in the gaps, such as “man climbs to the top of a large hill” or “logos appear on screen.”

Closed captions

Historically post houses and producers have opted to outsource caption creation to companies that specialize in those services. However, modern NLEs enable any editor to handle captions themselves and the increasing enforcement of ADA compliance is now adding to the deliverable requirements for many editors. With this increased demand, using a specialist may become cost prohibitive; therefore, built-in tools are all the more attractive.

There are numerous closed caption standards and various captioning file formats. The most common are .scc (Scenarist), .srt (SubRip), and .vtt (preferred for the web). Captions can be supplied as “embedded” (secondary data within the master file) or as a separate “sidecar” file, which is intended to play in sync with the video file. Not all of these are equal. For example, .scc files (embedded or as sidecar files) support text formatting and positioning, while .srt and .vtt do not. For example, if you have a lower-third name graphic come on screen, you want to move any caption from its usual lower-third, safe-title position to the top of the screen while that name graphic is visible. This way both remain legible. The .scc format supports that, but the other two don’t. The visual appearance of the caption text is a function of the playback hardware or software, so the same captions look different in QuickTime Player versus Switch or VLC. In addition, SubRip (.srt) captions all appear at the bottom, even if you repositioned them to the top, while .vtt captions appear at the top of the screen.

You may prefer to first create a transcription of the dialogue using an outside service, rather than simply typing in the captions from scratch. There are several online resources that automate speech-to-text, including SpeedScriber, Simon Says, Transcriptive, and others. Since AI-based transcription is only as good as the intelligibility of the audio and dialects of the speakers, they all require further text editing/correction through on online tool before they are ready to use.

One service that I’ve used with good results is REV.com, which uses human transcribers for greater accuracy, as well as offering on online text editing tool. The transcription can be downloaded in various formats, including simple text (.txt). Once you have a valid transcription, that file can be converted through a variety of software applications into .srt, .scc, or .vtt files. These in turn can be imported into your preferred NLE for timing, formatting, and positioning adjustments.

Getting the right look

There are guidelines that captioning specialists follow, but some are merely customary and do not affect compliance. For example, upper and lower case text is currently the norm, but you’ll still be OK if your text is all caps. There are also accepted norms when English (or other) subtitles appear on screen, such as for someone speaking in a foreign language. In those cases, no additional closed caption text is used, since the subtitle already provides that information. However, a caption may appear at the top of the screen identifying that a foreign language is being spoken. Likewise, during sections with only music or ambient sounds, a caption may briefly identifying it as such.

When creating captions, you have to understand that readability is key, so the text will not always run perfectly in sync with the dialogue. For instance, when two actors engage in rapid fire dialogue, each caption may stay on longer than the spoken line. You can adjust the timing against that scene so that they eventually catch up once the pace slows down. It’s good to watch a few captioned programs before starting from scratch – just to get a sense of what works and what doesn’t.

If you are creating captions for a program to run on a specific broadcast network or streaming services, then it’s a good idea to find out of they provide a style guide for captions.

Using your NLE to create closed captions

Avid Media Composer, Adobe Premiere Pro, DaVinci Resolve, and Apple Final Cut Pro X all support closed captions. I find FCPX to be the best of this group, because of its extensive editing control over captions and ease of use. This includes text formatting, but also display methods, like pop-on, paint-on, and roll-up effects. Import .scc files for maximum control or extract captions from an existing master, if your media already has embedded caption data. The other three NLEs place the captions onto a single data track (like a video track) within which captions can be edited. Final Cut Pro X places them as a series of connected clips, like any other video clip or graphic. If you perform additional editing, the FCPX magnetic timeline takes care of keeping the captions in sync with the associated dialogue.

Final Cut’s big plus for me is that validation errors are flagged in red. Validation errors occur when caption clips overlap, may be too short for the display method (like a paint-on), are too close to the start of the file, or other errors. It’s easy to find and fix these before exporting the master file.

Deliverables

NLEs support the export of a master file with embedded captions, or “burned” into the video as a subtitle, or the captions exported as a separate sidecar file. Specific format support for embedded captions varies among applications. For example, Premiere Pro – as well as Adobe Media Encoder – will only embed captioning data when you export your sequence or encode a file as a QuickTime-wrapped master file. (I’m running macOS, so there may be other options with Windows.)

On the other hand, Apple Compressor and Final Cut Pro X can encode or export files with embedded captions for formats such as MPEG2 TS, MPEG 2 PS, or MP4. It would be nice if all these NLEs supported the same range of formats, but they don’t. If your goal is a sidecar caption file instead of embedded data, then it’s a far simpler and more reliable process.

Audio descriptions

Compared to closed captions, providing audio description files is relatively easy. These can either be separate audio files – used as sidecar files for secondary audio – or additional tracks on the delivery master. Sometimes it’s a completely separate video file with only this version of the mix. Advanced platforms like Netflix may also require an IMF (Interoperable Master Format) package, which would include an audio description track as part of that package. When audio sidecar files are requested for the web or certain playback platforms, like hotel TV systems, the common deliverable formats are .mp3 or .m4a. The key is that the audio track should be able to run in sync with the rest of the program.

Producing an audio description file doesn’t require any new skills. A voice-over announcer is describing any action that occurs on screen, but which wouldn’t otherwise make sense if you were only listening to audio without that. Think of it like a radio play or podcast version of your TV program. This can be as simple as fitting additional VO into the gaps between actor/host/speaker dialogue. If you have access to the original files (such as a Pro Tools session) or dialogue/music/effects stems, then you have some latitude to adjust audio elements in order to fit in the additional voice-over lines. For example, sometimes the off-camera dialogue may be moved or edited in order to make more space for the VO descriptions. However, on-camera/sync dialogue is left untouched. In that case, some of this audio may be muted or ducked to make space for even longer descriptions.

Some of the same captioning service providers also provide audio description services, using their pool of announcers. Yet, there’s nothing about the process that any producer or editor couldn’t handle themselves. For example, scripting the extra lines, hiring and directing talent, and producing the final mix only require a bit more time added to the schedule, yet permits the most creative control.

ADA compliance has been around since 1990, but hasn’t been widely enforced outside of broadcast. That’s changing and there are no more excuses with the new NLE tools. It’s become easier than ever for any editor or producer to make sure they can provide the proper elements to touch every potential viewer.

For additional information, consult the FCC guidelines on closed captions.

The article was originally written for Pro Video Coalition.

©2020 Oliver Peters