Another operation that often changes peak levels of the file is lossy compression. It is a very common source of clipping. In fact, most music nowadays is distributed in compressed formats: either mp3 (MPEG-1 Layer III) or AAC (Advanced Audio Coding).
These compression algorithms reduce the size of a CD-quality audio by a factor of 5–10, depending on the chosen bitrate. This is a far stronger compression than a typical 2x ratio achievable by lossless codecs, like FLAC or ALAC. Inevitably, the signal encoded by mp3 or AAC cannot be preserved exactly. These algorithms create an approximation of the signal that sounds as close to the original as possible.
Lossy encoders produce a credibly sounding approximation of the waveform.
Lossy encoding can be viewed as a low-bit-depth quantization of a signal. The precision of this quantization depends on the selected bitrate, while quantization noise (a compression error: the difference between the original and the decoded signal) is spectrally shaped to be minimally audible — this is achieved by a psychoacoustic model.
The amplitude of quantization noise depends on the chosen bitrate and signal complexity. Slowly-changing tonal signals are easy to approximate, while random noises are hard (see some examples below). The amplitude of compression noise is often proportional to the signal level, much like with a 32-bit float sample format. The noise of a 32-bit float format is always 150 dB lower than the signal, while the noise of mp3 or AAC compression is usually only 15–30 dB below the signal level.
When quantization noise is added to the waveform, it can change its peaks levels. If the waveform has been brickwall-limited to a certain level, chances are that 50% of waveform peaks will rise in level.
This increase in levels in often wrongly attributed to ISPs — intersample peaks (or true peaks). But, in fact, it has little to do with ISPs. In the waveform above, true peaks have been limited to −1 dBTP, but after lossy compression both sample peaks and true peaks are significantly higher. The cause of this increase is quantization happening during lossy compression.
Lossy compression is often easy to identify by looking at the spectrogram. The upper frequencies are completely cut (the psychoacoustic model finds them inaudible) and the cutoff line is serrated, with occasional “black holes” below the cutoff. Signals at middle frequencies are typically preserved much better, because they matter more for the perception. The goal of a psychoacoustic model is to allocate more bits to spectrogram bins that have a higher chance of being audible and shape quantization noise below the masking threshold.
Lossy encoding often creates a serrated cutoff at higher frequencies.
When mastering your recording, it is always important to consider high chances of lossy compression somewhere down the line, and completely out of your control. How much headroom is required to prevent clipping? There is no clear answer: it depends on the encoder, the bitrate, and the signal itself. The most typical recommendation from “Mastering for iTunes” is to keep true peak levels at or under −1 dBTP. As can be seen from the sample above, this is not always sufficient to prevent clipping (the true peaks of the mp3-encoded Music.wav rise by 1.73 dB in my example). But the goal here is to prevent most of the clipping, not all of it. In fact, there are some pathological cases when peak levels are rising by as much as 10 dB after lossy compression. Below is a sample of white noise with binary p.d.f. that experiences a dramatic rise in peak levels after either mp3 or AAC compression.
Interestingly, any clipping that happens because of peak level increase during lossy encoding is reversible! It happens during file decoding, while the internal representation of an mp3 (or AAC) file is not clipped — very much like a floating-point sample format. Some decoders are smart and able to apply some negative gain to prevent clipping. Others can decode to a non-clipping 32-bit float format, where you can manually take care of any overshoots. Unfortunately, most decoders are dumb: they decode to a 16-bit sample format and clip. So, the safest way to prevent clipping of mp3 or AAC files is to keep some headroom. Even half a decibel of headroom will eliminate most of the audible clipping.
Decoding to float in RX
MP3 has generous headroom. Other codecs like ac3, mpc, or vorbis saturate much sooner above full scale, and lose accuracy in quiet. The most precise decoders, mpglib and the one included with Reaper, decode without distortion between -200 and +60 dBFS. Optimized ffmpeg decoder has about 150 dB dynamic range.
Fewer applications can encode overs that occur in DSP operations, even if the format does offer the headroom. Current versions of Reaper clip output to all lossy formats.
If mastering engineers were in control over the encode, I guess they could utilize the extended range as an alternative to hard clipping. I haven't observed wrap-around overflow with mp3.
In my opinion, this matter of ISP's was blown out of proportion by companies desiring to sell meters and limiters. The 1 or 2 dB range reserved for peaks would be better used for encoding musical dynamic range. Even if a "dumb" decoder is used, intersample overs do not occur if the singal is attenuated with a typical digital volume control, and lossy codec or SRC peaks that do occur are short and unlikely audible.
Re: Decoding to float in RX
Edited at 2018-12-14 11:02 pm (UTC)
Re: Decoding to float in RX