Vacuum Packed

You've reached the new home for Jay Rose's “dplay.com” articles and tutorials!

Vacuum Packed

Compress audio files without losing quality? You can, if you measure them the right way.

pcm My last two blog tutorials discussed neural masking, and how an mp3 or AAC can be good enough for broadcast or film sound when you do it right. (If you followed the link to my website, you even got proof.)

But sometimes, even AAC's tiny losses can be too much: you might be sending elements that will be processed or compressed more, or be saving an archive. While most non-audio files can be successfully squeezed with Winzip or Stuffit, those processes behave strangely with audio.

Zip-like algorithms look for repeating patterns in a file, and create shortcuts for the ones it finds; when you open a Zipped file, it replaces the shortcuts with the original patterns.

That's fine for a document or spreadsheet. But audio files seldom have that kind of repetition. Even if a musical pattern or spoken phrase occurs twice, the audio sample data can be different because of tiny differences in performance or background noise.

Want proof? I created four sixty-second stereo files: a sinewave, a sweep, pink noise, and a typical voice/music mix. All were recorded at 16 bits 48 kHz, the usual standard for DV sound.

Sinewave:     A steady tone at a single frequency. I used 1 kHz at -20 dBFS, the lineup sound that's usually played with color bars.

Sweep:     A pure tone with a constantly changing frequency. In this case, it went from 20 Hz to 20 kHz at -20 dBFS over 30 seconds, then repeated.

Pink Noise:     A random "shhh" with equal energy in each octave, used for acoustic testing. This one was at -20 dBFS, and since it was generated in a computer with pseudo-random numbers, the pattern does repeat after a while.

All four files were 11.2 MB before compression. Zipping them gave me these results:

ssheetZip If the numbers (shown in Kilobytes) make your head spin, relax. I've boiled them down to a nice-looking graph, later in this article.

The sinewave was easily analyzed by the Zip process - measure one pure wave at a specific frequency, you've measured them all - and shrank to less than 2% of its original size. The sweep and pink noise were harder for Zip to deal with, but it found enough patterns to save about 25%. But who saves compressed test signals? The typical mix hardly shrank at all when Zipped; that wasn't worth processing at all.

There is a better way to squeeze audio files, and still recover the sound with no loss.

Dawn of Delta

To understand how the better way works, think about what audio data actually means.

The top of this article has a simplified diagram of how a sound gets digitized. We're looking at half a sine wave at about -3 dBFS. Five times during that half-wave, we measure its voltage (blue lines) on a scale between 0 and roughly 32,000. The results - numbers between 2601 and 20,400 - are shown in the drawing.

Why ~32,000? Because 16 bit sound allows roughly 65,000 possible values. Half of those are reserved for negative numbers (the other half of the wave). Why only about 20,00 as our maximum voltage? So other sounds can be louder, up to 0 dBFS.

Why only 5 measurements? To simplify the drawing. If this were our 1 kHz test file, there'd be 24 of them.

The highest numbers in our drawing (and their negative equivalents for the other half-wave) require 16 bits to store. If we could somehow make those numbers somewhat smaller, we could store them with fewer bits... saving file space.

It's not difficult.

delta Here's the exact same wave. Only instead of noting the value of each sample by itself, we write down just the difference from the previous sample... mathematicians call it the delta.

Same wave, same numbers, just written differently. These smaller numbers will need fewer bits to store!

It's like if you gave walking directions in two different ways:

Turn right at your front door, go two blocks, turn left, go four blocks, right again one block and you're there.

or

Turn right at your front door, go two blocks, turn left and keep going until you're six blocks from home, right again until you're seven blocks from home...

Both directions will get you to the same place, but the first version is simpler.

Back when desktop computers didn't have enough power for psycho-acoustic algorithms, this is how audio data compression was done. You can still select it in most audio programs: QuickTime IMA, or Microsoft ADPCM. The delta measurements were arbitrarily limited to 4 bits instead of 16, for 1/4 the data.

Running our test files through IMA gives us the expected 75% reduction (plus a few bytes for overhead):

ssheetIMA The only problem with this scheme was that it wasn't necessarily lossless. Sometimes, samples are more than 4 bits apart. In the case of sudden loud or high frequency sounds, delta numbers would lag behind the proper for a few samples. This would create a soft, short burst of noise around the signal.

As soon as computers were able to handle the more efficient and better sounding masking algorithms, delta encoding was mostly abandoned.

Delta is Ready...

But Moore's Law still rules, and desktop computers keep getting more powerful.

Modern computers can look at a signal and predict the total delta between individual samples, no matter how big a jump. They're fast enough to check the guesses, and go back and refine them until they're accurate. They note the rules for that guesswork in the file, and voilà:

Reasonable shrinkage, with perfect recovery when you open the file.

Here are the numbers for two implementations, Apple Lossless and FLAC (Free Lossless Audio Encoder):

ssheet

Signals that are easier to predict will shrink more. That's why the sinewave loses about 90% (much smaller and more accurate than the original delta method).

But even more complex signals, like our voice/music mix, can shrink 50%. That's with absolutely no signal loss. When you open the compressed file, it's a perfect clone of the original.

And unlike most mp3, silences don't waste much space at all (since the delta remains 0 during a silence). So if you're sending stems, or individual tracks with pauses, you see even grater shrinkage.

Eye Candy

I've thrown some numbers around in this article. They may be easier to grasp as a chart:

sizes The files compressed with IMA (delta 4:1) are nicely shrunk. But remember, this is a lossy compression... sudden jumps in the waveform get noisy. The mixed track doesn't shrink quite as much under Apple Lossless or FLAC, but it does end up considerably smaller than the Zipped version. And the process is as transparent - or lossless - as Zipping a Word doc before you email it.

FLAC is actually capable of greater compression, because its guesses can be fine-tuned. Here's a typical FLAC control panel:

flac

But Apple Lossless (and an equivalent setting in Windows Media) are a lot easier to use, and give almost as good results. So next time you have to get small, take a trip to the Delta.

Next Time: An couple of unintuitive shortcuts that can speed up sending audio-for-video in QuickTime.