High-Fidelity Noise Reduction with Differentiable Signal Processing

Christian J. Steinmetz,Thomas Walther,Joshua D. Reiss
2023-10-18
Abstract:Noise reduction techniques based on deep learning have demonstrated impressive performance in enhancing the overall quality of recorded speech. While these approaches are highly performant, their application in audio engineering can be limited due to a number of factors. These include operation only on speech without support for music, lack of real-time capability, lack of interpretable control parameters, operation at lower sample rates, and a tendency to introduce artifacts. On the other hand, signal processing-based noise reduction algorithms offer fine-grained control and operation on a broad range of content, however, they often require manual operation to achieve the best results. To address the limitations of both approaches, in this work we introduce a method that leverages a signal processing-based denoiser that when combined with a neural network controller, enables fully automatic and high-fidelity noise reduction on both speech and music signals. We evaluate our proposed method with objective metrics and a perceptual listening test. Our evaluation reveals that speech enhancement models can be extended to music, however training the model to remove only stationary noise is critical. Furthermore, our proposed approach achieves performance on par with the deep learning models, while being significantly more efficient and introducing fewer artifacts in some cases. Listening examples are available online at <a class="link-external link-https" href="https://tape.it/research/denoiser" rel="external noopener nofollow">this https URL</a> .
Sound,Audio and Speech Processing
What problem does this paper attempt to address?
The problem this paper attempts to address is the limitations of existing audio noise reduction techniques when dealing with music signals. Specifically, while deep learning-based noise reduction methods perform well in enhancing speech quality, they often perform poorly with music signals, sometimes introducing artifacts or altering the sound characteristics of instruments. On the other hand, signal processing-based noise reduction algorithms can provide fine control and broad content support but usually require manual operation to achieve optimal results. To overcome these limitations, the authors propose a method that combines signal processing and neural network controllers to achieve fully automatic, high-fidelity noise reduction for both speech and music signals. This method addresses the aforementioned issues through the following points: 1. **Extending Speech Enhancement Models**: Extending existing speech enhancement models to the task of music signal noise reduction, but finding that standard enhancement pipelines (including the removal of stationary and non-stationary noise) produce more artifacts. 2. **Hybrid Signal Processing and Deep Learning**: Proposing a hybrid approach that uses differentiable signal processing techniques to train the denoiser and employs a neural network controller to estimate noise reduction parameters, achieving full-band stereo signal noise reduction. 3. **Improved Gradient Approximation**: Designing a two-stage training process to address the gradient approximation problem in dynamic range processors' ballistics (attack and release times) under significantly different conditions, making the training more stable and efficient. Through these methods, the authors aim to achieve a noise reduction system that can operate automatically while providing parameter control, with performance comparable to large deep learning models but at a lower computational cost.