Abstract:Noise reduction techniques based on deep learning have demonstrated impressive performance in enhancing the overall quality of recorded speech. While these approaches are highly performant, their application in audio engineering can be limited due to a number of factors. These include operation only on speech without support for music, lack of real-time capability, lack of interpretable control parameters, operation at lower sample rates, and a tendency to introduce artifacts. On the other hand, signal processing-based noise reduction algorithms offer fine-grained control and operation on a broad range of content, however, they often require manual operation to achieve the best results. To address the limitations of both approaches, in this work we introduce a method that leverages a signal processing-based denoiser that when combined with a neural network controller, enables fully automatic and high-fidelity noise reduction on both speech and music signals. We evaluate our proposed method with objective metrics and a perceptual listening test. Our evaluation reveals that speech enhancement models can be extended to music, however training the model to remove only stationary noise is critical. Furthermore, our proposed approach achieves performance on par with the deep learning models, while being significantly more efficient and introducing fewer artifacts in some cases. Listening examples are available online at <a class="link-external link-https" href="https://tape.it/research/denoiser" rel="external noopener nofollow">this https URL</a> .

What problem does this paper attempt to address?

The problem this paper attempts to address is the limitations of existing audio noise reduction techniques when dealing with music signals. Specifically, while deep learning-based noise reduction methods perform well in enhancing speech quality, they often perform poorly with music signals, sometimes introducing artifacts or altering the sound characteristics of instruments. On the other hand, signal processing-based noise reduction algorithms can provide fine control and broad content support but usually require manual operation to achieve optimal results. To overcome these limitations, the authors propose a method that combines signal processing and neural network controllers to achieve fully automatic, high-fidelity noise reduction for both speech and music signals. This method addresses the aforementioned issues through the following points: 1. **Extending Speech Enhancement Models**: Extending existing speech enhancement models to the task of music signal noise reduction, but finding that standard enhancement pipelines (including the removal of stationary and non-stationary noise) produce more artifacts. 2. **Hybrid Signal Processing and Deep Learning**: Proposing a hybrid approach that uses differentiable signal processing techniques to train the denoiser and employs a neural network controller to estimate noise reduction parameters, achieving full-band stereo signal noise reduction. 3. **Improved Gradient Approximation**: Designing a two-stage training process to address the gradient approximation problem in dynamic range processors' ballistics (attack and release times) under significantly different conditions, making the training more stable and efficient. Through these methods, the authors aim to achieve a noise reduction system that can operate automatically while providing parameter control, with performance comparable to large deep learning models but at a lower computational cost.

High-Fidelity Noise Reduction with Differentiable Signal Processing

DENOISPEECH: DENOISING TEXT TO SPEECH WITH FRAME-LEVEL NOISE MODELING

Deep Denoising for Hearing Aid Applications

Restoring speech intelligibility for hearing aid users with deep learning

Comparative Analysis Of Discriminative Deep Learning-Based Noise Reduction Methods In Low SNR Scenarios

Automatic bioacoustics noise reduction method based on a deep feature loss network

A Research on Different Filtering Techniques and Neural Networks Methods for Denoising Speech

Robust Time Series Denoising with Learnable Wavelet Packet Transform

Deep sound-field denoiser: optically-measured sound-field denoising using deep neural network

Real-time noise cancellation with Deep Learning

Deep learning-based denoising streamed from mobile phones improves speech-in-noise understanding for hearing aid users

Parameters Optimization for Impulse Noise Suppressing: A Deep Learning Based Approach

Denoising Speech Based on Deep Learning and Wavelet Decomposition

Differentiable Signal Processing With Black-Box Audio Effects

Weak signal extraction enabled by deep neural network denoising of diffraction data

Hybrid Noise Reduction And Enhancement of Audio Quality using Deep Learning

Deep Learning Tutorial for Denoising.

Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement

An Improved Nonnegative Matrix Factorization Algorithm Combined with K-Means for Audio Noise Reduction

Deep learning restores speech intelligibility in multi-talker interference for cochlear implant users

Bring the Noise: Introducing Noise Robustness to Pretrained Automatic Speech Recognition