September-Bits and Bytes

Measuring Loudness and Normalization:

Measuring how loud something is seems like a relatively straightforward process. For example, it’s easy to determine that my house is much louder when my kids are home from school than when they’re not. But it is not always that simple.

Sound is an interesting phenomenon, capable of carrying information of endless complexity. In its most simplistic form, it is a single, pure tone – a sine wave. A repeating oscillation of whatever medium the sound is traveling through, containing a single frequency, amplitude and a phase. For now, we will only focus on frequency and amplitude – which is basically just pitch and intensity. Single tones may be good for letting you know when your car door is open, but in terms of communicating complex information, you are pretty much stuck at the Morse code level. It’s not until we start combining thousands of these tones together, all at different frequencies and amplitudes, that we start to generate things like Foo Fighters songs.

Credit Gettyimages/mehmetbuma

If our goal is to figure out how loud a sound is, we might be tempted to measure the highest amplitude of the sound that we can find for a given duration of time. This would certainly give us an accurate value for peak loudness of a sound in the physical sense, but we have a slight problem: human ears have limitations that end up changing how the physical reality is perceived. Let’s imagine that we have a perfect recording system that can capture sound at any amplitude, and at any frequency. Now, let’s say I stand in front of a microphone and recite a poem. We could try looking for the peak amplitude of my voice and likely end up with an accurate loudness value. This would only work because my ears are well designed for detecting human speech frequencies. The physical reality of the sound matches very well with my perception of it.

But now imagine that, for reasons I cannot explain, there is also someone in the room with me training their dog, and they blew an animal training whistle at full blast while I recorded. When I play back the recording, I would not hear the whistle at all – it has a frequency that is beyond what my ears can perceive. The dog has a different opinion.  If we just looked for the highest amplitude in the sound, it would now be found in the whistle. While this would be a physically accurate value, it no longer represents human perception. It represents dog perception! My loudness meter would show me values from a sound that I am incapable of hearing.

So while this animal whistle case is more of an extreme example, it represents a fundamental point about detecting loudness: when it comes to what humans can perceive, not all sound frequencies are created equal. To make matters worse, even within the window of sound frequencies that we can actually hear, our perception of the frequencies do not follow a straight, consistent line. It looks more like a wiggly Bell curve.

It is clear that we need a method to measure loudness that matches our ear’s limitations, and fortunately, we now have many. The most commonly used method is weighted curves. Essentially, this is a collection of values that follow a curve that closely approximates human frequency perception. It is called a weighted curve because its values  determine how much or how little a particular frequency matters when we are calculating the perceived loudness of a sound.

The lower the value, the less that frequency matters in our loudness calculations, and vice versa. Breaking the complex sound up into its independent frequencies and then applying these weighted values accordingly, we now have a loudness meter that can show us values that will generally match with the average human experience.

In the streaming world, we primarily rely on these types of perception models for loudness adjustment of the injected ad content. Under typical broadcast circumstances, you have very tight control over the dynamics of the content and the signal paths that they flow through. With streaming dynamic content, you begin to run into things like per-listener injected ads, which can come from any number of possible sources. Injected ads become both an awesome capability and a difficult thing to manage sonically.

While production facilities will typically try to adhere to a loudness standard, the loudness of what we get back may vary depending on which standard was selected, which mastering technique was used, etc. Quite a number of subjective measures could be at play. If we make sure that the encoder input of the broadcast audio targets a predetermined loudness value, we can then use these perception models to make sure that the ad we inject is adjusted to the same level, even if that ad happens to have an animal whistle playing in the background. 

This is why having an accurate loudness measurement is so critical. If our loudness measurement took the animal whistle into account, we would electronically see the ad as being too loud, even if the actual ad content was well within the normal perception range. The result would be a system that tries to lower the volume of an imperceptible tone, which forces the content you can actually hear to be too quiet. In the real world, it isn’t a whistle. It may be a cymbal crash, sibilance (“s” and “th” words) – potentially any sound with complex frequency content and harmonics. We can save those details for another day.

Of course, this is just the tip of a very large, lumpy, subjective Iceberg. There are many other components and highly complex techniques involved in really polishing up audio from different sources. They fill many volumes of sometimes hard to read books. But at least I hope I was able to provide you with a little more clarity about the loudness of things.

-Written by Brian Bosworth, Principal Engineer, AmperWave

Photo Credit Gettyimages/Svetlana Apukhtina

Photo Credit Gettyimages/mehmetbuma