Speech Dereverberation with a Reverberation Time Shortening Target

Rui Zhou,Wenye Zhu,Xiaofei Li

2023-06-06

Abstract:This work proposes a new learning target based on reverberation time shortening (RTS) for speech dereverberation. The learning target for dereverberation is usually set as the direct-path speech or optionally with some early reflections. This type of target suddenly truncates the reverberation, and thus it may not be suitable for network training. The proposed RTS target suppresses reverberation and meanwhile maintains the exponential decaying property of reverberation, which will ease the network training, and thus reduce signal distortion caused by the prediction error. Moreover, this work experimentally study to adapt our previously proposed FullSubNet speech denoising network to speech dereverberation. Experiments show that RTS is a more suitable learning target than direct-path speech and early reflections, in terms of better suppressing reverberation and signal distortion. FullSubNet is able to achieve outstanding dereverberation performance.

Audio and Speech Processing,Sound

What problem does this paper attempt to address?

The paper primarily focuses on addressing the problem of speech dereverberation, particularly the challenges in single-channel scenarios. Severe late reverberation can significantly degrade the quality and intelligibility of speech and may lead to a decline in the performance of downstream tasks such as automatic speech recognition (ASR). Traditional dereverberation methods are based on statistical models and signal processing algorithms, while in recent years, deep neural networks (DNN) have made significant progress in solving this problem. The paper proposes a new learning objective, namely the Reverberation Time Shortening (RTS) objective, for speech dereverberation. Traditional methods typically use the direct path speech or include some early reflections as the learning target, but this abrupt truncation of reverberation may not be suitable for network training and may lead to large prediction errors and signal distortion. In contrast, the proposed RTS objective not only suppresses reverberation but also maintains the characteristic of exponential decay of reverberation, which helps to simplify the network training process and reduce signal distortion caused by prediction errors. Additionally, the authors experimentally applied the previously proposed FullSubNet speech denoising network to the speech dereverberation task. Experimental results show that compared to direct path speech and early reflections, RTS is a more suitable learning objective as it better suppresses reverberation and signal distortion; the FullSubNet network also achieves excellent performance in the speech dereverberation task.

Speech Dereverberation with a Reverberation Time Shortening Target

Improve Speech Enhancement Using Perception-High-Related Time-Frequency Loss.

On phase recovery and preserving early reflections for deep-learning speech dereverberation

Evaluation of the Method Based on Psr Techniques for Target Detection in Reverberation

RLS-Based Adaptive Dereverberation Tracing Abrupt Position Change of Target Speaker

Joint Training for Simultaneous Speech Denoising and Dereverberation with Deep Embedding Representations

A Time Domain Progressive Learning Approach with SNR Constriction for Single-Channel Speech Enhancement and Recognition

Dereverberantion Based on Generalized Spectral Subtraction for Distant-Talking Speaker Recognition

Multi-channel Adaptive Dereverberation Tracing Abrupt Position Change of Target Speaker.

Monaural Speech Dereverberation using Deformable Convolutional Networks

Lasso-based Reverberation Suppression in Automatic Speech Recognition

An End-to-End Deep Learning Approach to Simultaneous Speech Dereverberation and Acoustic Modeling for Robust Speech Recognition

DRC-NET: Densely Connected Recurrent Convolutional Neural Network for Speech Dereverberation

Speech enhancement with frequency domain auto-regressive modeling

Joint Training of DNNs by Incorporating an Explicit Dereverberation Structure for Distant Speech Recognition

Lite-RTSE: Exploring a Cost-Effective Lite DNN Model for Real-Time Speech Enhancement in RTC Scenarios

Frequency-domain Dereverberation on Speech Signal Using Surround Retinex

USDnet: Unsupervised Speech Dereverberation via Neural Forward Filtering

Simultaneous Denoising and Dereverberation Using Deep Embedding Features

Multi-channel adaptive dereverberation robust to abrupt change of target speaker position.

Joint sparse representation based cepstral-domain dereverberation for distant-talking speech recognition