Music De-limiter Networks via Sample-wise Gain Inversion

Chang-Bin Jeon,Kyogu Lee

2024-06-23

Abstract:The loudness war, an ongoing phenomenon in the music industry characterized by the increasing final loudness of music while reducing its dynamic range, has been a controversial topic for decades. Music mastering engineers have used limiters to heavily compress and make music louder, which can induce ear fatigue and hearing loss in listeners. In this paper, we introduce music de-limiter networks that estimate uncompressed music from heavily compressed signals. Inspired by the principle of a limiter, which performs sample-wise gain reduction of a given signal, we propose the framework of sample-wise gain inversion (SGI). We also present the musdb-XL-train dataset, consisting of 300k segments created by applying a commercial limiter plug-in for training real-world friendly de-limiter networks. Our proposed de-limiter network achieves excellent performance with a scale-invariant source-to-distortion ratio (SI-SDR) of 24.0 dB in reconstructing musdb-HQ from musdb-XL data, a limiter-applied version of musdb-HQ. The training data, codes, and model weights are available in our repository (<a class="link-external link-https" href="https://github.com/jeonchangbin49/De-limiter" rel="external noopener nofollow">this https URL</a>).

Sound,Audio and Speech Processing

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the phenomenon of "Loudness War" in the music industry. Specifically, in order to make music sound louder, many music producers and engineers use limiters to heavily compress audio signals. Although this increases the final loudness, it also reduces the dynamic range of music and may cause problems such as auditory fatigue and hearing damage. To solve this problem, the author proposes a music de - limiter networks based on the Sample - wise Gain Inversion (SGI) framework. This framework aims to restore the original uncompressed state from heavily compressed music signals, thereby improving sound quality and protecting the hearing health of listeners. In addition, the author also constructs a dataset named musdb - XL - train for training this de - limiter network to meet the needs in practical applications. Through this method, the author hopes to not only enhance the listening experience of listeners but also provide more creative freedom for music creators, because they can use audio materials closer to the original state for sampling and creation.

Music De-limiter Networks via Sample-wise Gain Inversion

Towards robust music source separation on loud commercial music

High-Fidelity Noise Reduction with Differentiable Signal Processing

Stochastic Restoration of Heavily Compressed Musical Audio using Generative Adversarial Networks

Perceiving Music Quality with GANs

Generative De-Quantization for Neural Speech Codec via Latent Diffusion

MusicHiFi: Fast High-Fidelity Stereo Vocoding

Audio Decoding by Inverse Problem Solving

Machine Perceptual Quality: Evaluating the Impact of Severe Lossy Compression on Audio and Image Models

SCNet: Sparse Compression Network for Music Source Separation

Robust Lossy Audio Compression Identification

An Independence-promoting Loss for Music Generation with Language Models

Model and Deep learning based Dynamic Range Compression Inversion

Psychoacoustic Calibration of Loss Functions for Efficient End-to-End Neural Audio Coding

The Whole Is Greater than the Sum of Its Parts: Improving Music Source Separation by Bridging Network

BigWavGAN: A Wave-To-Wave Generative Adversarial Network for Music Super-Resolution

Exploiting Time-Frequency Conformers for Music Audio Enhancement

iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning

GSEP: A robust vocal and accompaniment separation system using gated CBHG module and loudness normalization

The whole is greater than the sum of its parts: improving music source separation by bridging networks

InSE-NET: A Perceptually Coded Audio Quality Model based on CNN