Log in

No account? Create an account
iZotope RX 7: a summary of new features


  • Music Rebelance module for separation of music into 4 stems: vocals, drums, bass, melodic instruments.

  • Dialogue De-reverb module for reverb attenuation in speech. The algorithm is different from the previously existing De-reverb module.

  • Dialogue Contour module for time-variable pitch shifting of speech or vocals (with formant preservation).

  • Variable Time module for time-variable time stretching (without pitch shift) with a control curve, uses Radius algorithm.

  • Variable Pitch module for time-variable pitch shifting (without time stretching) with a control curve, uses Radius algorithm. Previously known "Pitch Contour" functionality (curve-controlled resampling) has been also merged into this module.

Improved DSP

  • De-rustle and Dialogue Isolate have got slightly better processing quality for stereo and multichannel files. They are also now available as AudioSuite plug-ins.

UI and workflow

  • Support of multichannel (surround) files, up to 10 channels.

  • Repair Assistant analyzes your audio for few common problems (clipping, clicks, hum, noise) and suggests solutions as module chains.


iZotope RX 6: a summary of new features

  • De-bleed module for reduction of cross-bleed between mics. It subtracts one track from the other.

  • De-ess module and plugin for sibilance reduction with a new spectral shaping algorithm.

  • De-rustle module for reduction of lavalier mic rustle in speech.

  • De-wind module for reduction of intermittent low-frequency wind rumble in speech.

  • Dialogue Isolate module for separation of speech from various noises (including non-stationary noises).

  • Mouth De-click module and plugin for reduction of lip smacks and mouth clicks.

Improved DSP

  • Ambience Match has got better level matching of synthesized room tone.

  • Breath Control module has our De-breath algorithm available separately from the Leveler module.

  • Center Extract module (previously in Channel Ops) has got an improved algorithm with extra controls for artifact reduction.

  • De-click module and plugin now have a Low-latency algorithm for faster operation and better results on mouth clicks.

  • De-plosive module and plugin have got an improved plosive detection algorithm.

  • Deconstruct module can now separate signals into 3 components: tones, noise, and transients. It also features the improved algorithm with artifact smoothing.

  • Spectral Repair's Attenuate mode now behaves differently when Strength is less than 1.

  • Voice De-noise (previously Dialogue Denoiser) now has a Music mode with slower gate ballistics and slower adaptive learning.

  • Find Similar algorithm has been significantly improved and now allows to select all similar events together.

UI and workflow

  • Most modules with multiple tabs (e.g. De-click, De-noise, Channel Ops) have been split into separate modules.

  • Mp3 export is now available.

  • Save and Save As now work like they did prior to RX 4: they overwrite the original file or save it under a different name.

  • Composite view allows to join multiple files into one and edit them together (e.g., Spectral Repair a selection across all tracks).

  • Module list has become quite long and now has presets for showing/hiding the modules.

  • Export Regions to Files now also works with markers.

  • Export Screenshot can now save GIF files with animated selections.


Lossy encoding and peak levels
Part 1: SRC and peak levels

Another operation that often changes peak levels of the file is lossy compression. It is a very common source of clipping. In fact, most music nowadays is distributed in compressed formats: either mp3 (MPEG-1 Layer III) or AAC (Advanced Audio Coding).

These compression algorithms reduce the size of a CD-quality audio by a factor of 5–10, depending on the chosen bitrate. This is a far stronger compression than a typical 2x ratio achievable by lossless codecs, like FLAC or ALAC. Inevitably, the signal encoded by mp3 or AAC cannot be preserved exactly. These algorithms create an approximation of the signal that sounds as close to the original as possible.

Lossy encoders produce a credibly sounding approximation of the waveform.

Lossy encoding can be viewed as a low-bit-depth quantization of a signal. The precision of this quantization depends on the selected bitrate, while quantization noise (a compression error: the difference between the original and the decoded signal) is spectrally shaped to be minimally audible — this is achieved by a psychoacoustic model.

The amplitude of quantization noise depends on the chosen bitrate and signal complexity. Slowly-changing tonal signals are easy to approximate, while random noises are hard (see some examples below). The amplitude of compression noise is often proportional to the signal level, much like with a 32-bit float sample format. The noise of a 32-bit float format is always 150 dB lower than the signal, while the noise of mp3 or AAC compression is usually only 15–30 dB below the signal level.

When quantization noise is added to the waveform, it can change its peaks levels. If the waveform has been brickwall-limited to a certain level, chances are that 50% of waveform peaks will rise in level.

Lossy encoding increases peak levels of the waveform: Music.wav

This increase in levels in often wrongly attributed to ISPs — intersample peaks (or true peaks). But, in fact, it has little to do with ISPs. In the waveform above, true peaks have been limited to −1 dBTP, but after lossy compression both sample peaks and true peaks are significantly higher. The cause of this increase is quantization happening during lossy compression.

Lossy compression is often easy to identify by looking at the spectrogram. The upper frequencies are completely cut (the psychoacoustic model finds them inaudible) and the cutoff line is serrated, with occasional “black holes” below the cutoff. Signals at middle frequencies are typically preserved much better, because they matter more for the perception. The goal of a psychoacoustic model is to allocate more bits to spectrogram bins that have a higher chance of being audible and shape quantization noise below the masking threshold.

Lossy encoding often creates a serrated cutoff at higher frequencies.

When mastering your recording, it is always important to consider high chances of lossy compression somewhere down the line, and completely out of your control. How much headroom is required to prevent clipping? There is no clear answer: it depends on the encoder, the bitrate, and the signal itself. The most typical recommendation from “Mastering for iTunes” is to keep true peak levels at or under −1 dBTP. As can be seen from the sample above, this is not always sufficient to prevent clipping (the true peaks of the mp3-encoded Music.wav rise by 1.73 dB in my example). But the goal here is to prevent most of the clipping, not all of it. In fact, there are some pathological cases when peak levels are rising by as much as 10 dB after lossy compression. Below is a sample of white noise with binary p.d.f. that experiences a dramatic rise in peak levels after either mp3 or AAC compression.

A pathological case of peak level increase after lossy compression: Noise.wav

Interestingly, any clipping that happens because of peak level increase during lossy encoding is reversible! It happens during file decoding, while the internal representation of an mp3 (or AAC) file is not clipped — very much like a floating-point sample format. Some decoders are smart and able to apply some negative gain to prevent clipping. Others can decode to a non-clipping 32-bit float format, where you can manually take care of any overshoots. Unfortunately, most decoders are dumb: they decode to a 16-bit sample format and clip. So, the safest way to prevent clipping of mp3 or AAC files is to keep some headroom. Even half a decibel of headroom will eliminate most of the audible clipping.
Tags: , , , , ,

iZotope RX 5: a summary of new features

  • De-plosive module detects and attenuates mic pops on transient sounds like 'p', 'b', etc.

  • Signal Generator module can insert silence, add tones, noises, and DC offset.

Improved DSP

  • Leveler module has got De-breath and De-ess features. The leveling algorithm has been updated as well.

  • Corrective EQ is a new name for the older EQ module. The accuracy of notch filtering has been improved in linear-phase mode.

  • Faster calculation of Waveform Statistics on a Mac, more accurate detection of True Peak levels.

UI and workflow

  • RX Connect has got a new clip-by-clip mode in Pro Tools, as well as an ability to work with handles. Read the manual on how to enable this feature in Pro Tools.

  • Ambience Match is now available as an AudioSuite plugin in RX Advanced.

  • De-reverb is now available in RX Standard.

  • Instant Process tool applies your selected processing immediately while you are making a selection.

  • Module chain allows running multiple operations on the same file and saving them as a preset.

  • Support of Retina displays gets you better resolution on high-DPI displays on a Mac.

  • Updated icons, search in markers and regions.


  • Several buttons have been removed from the menu toolbar: invert selection, zoom to left/right edges of the selection. They are still available through the menu or as keyboard shortcuts.

  • Signal Generator can be found in Window > Modules menu.


RX Loudness Control

RX Loudness Control is a new product for ensuring loudness compliance of a mix. As of version 1.0, it only works in Pro Tools, Media Composer, and Adobe Premiere Pro CC. It applies global gain to meet the Integrated loudness spec and IRC2 limiting to meet the Max true peak spec. Optionally, a RMS limiter ensures that Short-term or Momentary loudness do not exceed the spec too, as required by EBU R128 s1 standard.
Read more...Collapse )

SRC and peak levels

It is often assumed that resampling (SRC) or dither are “safe” operations that do not change peak levels of the file and cannot cause clipping. If the file clips, people are surprised and suspect there is something wrong with the SRC algorithm.

In fact, both SRC and dithering can increase peak levels of the file and cause clipping.
Read more...Collapse )

Ozone 6 DSP gotchas

  • Incorrect sequence of modules (Maximizer > Dither > DC offset) prevents Maximizer and Dither from working properly when DC offset is enabled. Until the sequence is patched to DC offset > Maximizer > Dither, it is not recommended to use DC offset feature. Possible manifestations: bit depth higher-than-specified, non-brickwall limiting.

  • Resampling the file during Export destroys dither and leads to truncation. It may also cause clipping because peak levels may change.

  • Inaccurate metering in Vectorscope: seems to be dependent on the buffer size and contains dropouts. Ozone 5 or Insight are recommended as temporary replacements.

  • DC offset meter shows some offset even when there is none.

  • ISP protection in Maximizer is still available, it is called True Peak Limiting, there's a button under the threshold slider (but read the Maximizer / DC offset gotcha above).


Manual or Auto in Dialogue Denoiser?

When you know that the noise floor is constant (does not change over time) and you have time to do manual learning — go for Manual. It will do less harm to the signal because you'll tell Denoiser precisely what the noise is.
When the noise floor is changing, or you need the fastest result, or no single noise fragment is available for manual learning — go for Auto. It will do a good job on speech, but beware: it can be harmful to music.

Similar principles apply to Spectral Denoiser, although it is more flexible: you can train it on multiple bits of disjoint selections.

DSP changes in RX 4.01
Aside from numerous UI/workflow changes and bug fixes, RX 4.01 has the following DSP updates:

  • Resampler: fixed a minor quality regression in iZotope 64-bit SRC (which has been updated in RX 4).

  • Hum Removal: fixed latency compensation.

Tags: ,

Null-testing in RX

Null-testing is a simple way to check if two signals are identical: one signal is inverted and mixed with the other one. If the difference is zero, two signals are the same. If not, the level of the difference signal can be easily measured.

There are a couple of caveats in this technique:

  1. If one signal is time-shifted with respect to another, they sound the same, but the null-test will not pass, unless 2 signals are precisely aligned. Often times, the shift between two signals is not an integer number of samples. It requires a sub-sample alignment, which is hard to do precisely. This can be done with RX's Azimuth Alignment algorithm in Channel Operations. Sometimes the delay between two signals is even frequency-dependent, such as after a nonlinear-phase EQ.

  2. Low (or zero) amplitude of the difference signal indicates that two signals are close (or identical). However high amplitude of the difference signal does not mean that two signals sound differently. High difference may be due to time delay, phase inversion, minor frequency or phase response discrepancies — these things are often inaudible, but easily affect a null-test. So, the mathematical difference does not always indicate a problem with quality.

In newer versions on RX, null-testing is simple: select one sample, copy it into the clipboard (Ctrl+C), and paste with inversion into the second sample (Alt+V).
Older versions of RX require phase inversion in Channel Operations and mix-pasting.