THE AUDIO BAND is usually taken to mean "20 Hz - 20 kHz", meaning that humans can hear a range between twenty vibrations per second and twenty thousand. But to paraphrase Orwell, not all frequencies are created equal. Physics and human evolution dictate that the extremes of that range aren't very important. Even more interesting: what you might consider the entire top half of that range really doesn't have much going on!
This is vital to anyone working with sound, and particularly to those working with sound for film or video. Not understanding how this works can lead you to bad equipment decisions and bad mixes.
Fortunately, you don't need golden ears to understand which frequencies are important. Regular ears will do fine. But you'll also need your eyes and brain, and a few screenshots.
First, a tiny bit of math. Very tiny. If your eyes start to glaze, skip down to the next paragraph.
Frequency is measured in Hertz (Hz), the number of times per second a wave vibrates. Hearing is logarithmic: at low frequencies, a shift of only few Hertz can sound the same as a many hundreds of Hertz higher up. For example, C and D on the bottom of the piano are at 32.7 Hz and 36.7 Hz, just 4 Hz different. That same interval at the top of the keyboard is a difference of more than 500 Hz. That hearing range of "20 Hz - 20 kHz" might lead you to think the middle of the band — the dividing point between bass and treble — is around 10 kHz. But that's a very high pitch indeed, and only a few orchestral instruments even reach it. Actually the middle of the audio band is around 1 kHz.
The point of all this? Low frequencies pack more information per Hertz than higher ones, and the bottom 5 kHz of a soundtrack is more important than the 15 kHz above.
Battle of the bands
So let's get into it. I cut together some typical soundtrack elements: female and male announcers, synthesized and orchestral library music, and female and male pop singers with their groups. The montage sounds like this:
Full range (20 Hz - 20 kHz) for reference.
A spectrogram is a way to look at frequency and intensity over time. If we draw one of that montage, we get:
It's pretty intuitive, particularly if you play the montage while looking at it. I added some labels to make things easier.
Now let's just add one other factor: sharp filters, to let us hear just part of the audio band without being distracted. I ran our montage through some lab-quality filters to isolate approximately one octave of sound each time. The two lowest bands are wider than an octave, because there isn't much information in either one; tech details are at the end of this article.
If a filtered version has sound, it was there in the original. If it's silent, there wasn't anything in that band to begin with.
So that's it: frequency slices of our montage, and a way to hear and see exactly what's going on in each one! The results might surprise you (if not, at least you can see some pretty graphs). Let's begin.
NOTE: The audio samples on this page are AAC audio, converted by multipass in a very high quality commercial encoder. They should be playable in any modern HTML5 browser, including current versions of IE, Safari, Chrome, and Firefox. If you're not seeing playback controls and are running a compliant browser, send me your full system specs so I can report the bug; in return, I'll find you some mp3 versions. If you're not running a current browser, update it and try this page again.
Here's the extreme bass, between 10 and 100 Hz. It's more than a single octave, but even so, there's not much happening:
10 - 100 Hz
Most of this bottom band is filtered out during dialog recording, to avoid noise. But it hardly matters... as you can see, none of the female voice and only a tiny bit of the male extends that low. What's also interesting is very little of the scoring music goes below 60 Hz. And these are well recorded cues, from a professional library.
Don't hear much, even at the end of the montage where the spectrogram shows activity? Blame your speakers. Or right-click and download the audio file, and examine it in your own player.
Here's the midbass, 100 - 300 Hz (there's activity above 300 Hz because even my lab filters aren't perfect).
100 - 300 Hz
Now the voices start to come alive: these are the fundamental frequencies for most vowels. But you can't tell male from female here; that happens at higher harmonics. (Surprise: women's and men's voices aren't that different in the bass. That's because the relative size of the throat and mouth doesn't change very much - in physics terms - between even a small woman and a large man.)
The music uses these frequencies primarily for accompaniment.
300 - 600 Hz
These are the lower harmonics of voice frequencies, formed by resonances in the mouth, and this is first band where you can discern individual vowels... which you can see gliding as the speakers vary their intonation. This is a critical band when mixing a film or video track, since it contains most of the energy of both voices and melodic instruments.
There's a real potential for soundtrack elements to compete here, which is why alternate versions of scoring music, without melody lines, are often easier to mix. For the same reason, jingles and other music where the lyrics are important usually don't have melodies under the singing.
600 Hz - 1.2 kHz
Note how the female voice, naturally brighter, is stronger in this band. But these frequencies aren't particularly critical for dialog, and you can mix music hotter here. Also, musical activity seems more organized in this image than the previous ones: this band contains the harmonics that let you tell one instrument from another.
1.2 kHz - 2.4 kHz
This is a critical band for dialog: there's enough harmonic energy to tell most vowels apart, and all of the consonants start around here.
Our example singers are particularly strong in this range because they're trained to sing in the mask, opening resonances in their face to emphasize harmonics. But despite the activity in this range, volumes aren't as loud as they were an octave below.
2.4 kHz - 4.8 kHz
While most vowels have harmonics up here, they're not important for intelligibility and merely establish presence. (Telephones cut off at 3.5 kHz, yet retain enough of a voice that you can identify who's speaking.) This is a critical band for the brass instruments in the orchestra, which are rich in upper harmonics.
4.8 kHz - 9.6 kHz
This is sizzle country. You can hear just a little of the female voice, and only the friction consonants from the male voice. The synthesized music is almost completely gone. There's still some strength in the pop pieces, primarily upper harmonics of the strings, the lead guitar, and the percussion.
This band is important for adding life and brightness — that's why most radio music is mixed to emphasize it — but doesn't convey information.
9.6 kHz - 20 kHz
There's hardly anything going on with dialog (remember, the green areas are so soft they're almost inaudible). Only the orchestral brass and artificial harmonics added to the pop have any energy. If you listened to this track by itself, you'd have no idea what was going on.
The presence of these frequencies may help things sound more live than canned, but US analog television and FM radio are limited to 15 kHz and most people are satisfied with their sound. Most classic movies cut off at 12 kHz, because of limitations in a theatrical optical track. In fact, it takes really good ears to even hear the top of this band.
The Nitty Gritty
I chose most of these bands to be an octave wide, so each would contain the same number of musical notes. (The two lowest bands are considerably wider because hearing is less acute down there.) But the frequencies aren't magic. I chose them to reveal interesting things about voice and music, not because you necessarily should be equalizing at them. However, you can make some general conclusions:
Don't worry about dialog below 80 Hz. Attempting to boost down here will just make things muddy. Leave this band for music and sound effects.
Feel free to boost music a couple of dB somewhere around 800 Hz, with a broad curve.
Watch out for music interfering with dialog between 1.5 kHz - 3.5 kHz. This is where the consonants live. Dipping the music a few dB in this range can make the mix sound smoother.
If you're boosting above 14 kHz or so to make a track brighter, you're probably just adding noise. Try the same thing between 7 kHz and 10 kHz for better results.
And for heaven's sake, don't obsess about 96 kHz or 192 kHz recording in a film. While working up there does present some advantages for critical productions, it's more important to get the area below 10 kHz sounding right.
These tracks were made using 24 dB/octave filters at the indicated frequencies - and nothing else: no volume adjustment, no tweaking. If a particular band sounds very soft, that's because nothing was going on at those frequencies.
The files are AAC, encoded using multiple passes in a high quality commercial encoder, and put in m4a wrappers.
Producing Great Sound for Film and Video Newly revised fourth edition, larger and with online examples
"Should be mandatory reading for anyone seriously considering a career making movies." — Jeff Wexler, mixer of over 70 major films including Independence Day, Last Samurai, Jerry Maguire
"Cutting-edge ideas about the collaboration of sound and image, and also covers the basics... in an easy to read, easy to understand style." — Randy Thom, multiple Oscar winner and Director of Sound Design, Skywalker Sound