Abstract:Speech enhancement is an important preprocessing step in a wide diversity of practical fields related to speech signals, and many signal-processing methods have already been proposed for speech enhancement. However, the lack of a comprehensive and quantitative evaluation of enhancement performance for multi-speech makes it difficult to choose an appropriate enhancement method for a multi-speech application. This work aims to study the implementation of several enhancement methods for multi-speech enhancement in indoor environments of T60 = 0 s and T60 = 0.3 s. Two types of enhancement approaches are proposed and compared. The first type is the basic enhancement methods, including delay-and-sum beamforming (DSB), minimum variance distortionless response (MVDR), linearly constrained minimum variance (LCMV), and independent component analysis (ICA). The second type is the robust enhancement methods, including improved MVDR and LCMV realized by eigendecomposition and diagonal loading. In addition, online enhancement performance based on the iteration of single-frame speech signals is researched, as is the comprehensive performance of various enhancement methods. The experimental results show that the enhancement effects of LCMV and ICA are relatively more stable in the case of basic enhancement methods; in the case of the improved enhancement algorithms, methods that employ diagonal loading iterations show better performance. In terms of online enhancement, DSB with frequency masking (FM) yields the best performance on the signal-to-interference ratio (SIR) and can suppress interference. The comprehensive performance test showed that LCMV and ICA yielded the best effects when there was no reverberation, while DSB with FM yielded the best SIR value when reverberation was present.

Speech Enhancement by Short-Time Spectrum Estimation with Multivariate Laplace Speech Model

Speech Enhancement Based on Short-Time Spectral Amplitude Estimates in Low SNR

Speech Enhancement Approach Based on Minimum Estimate and Spectral Subtraction

Improved Speech Enhancement Algorithm Based on Short-Time Spectral Analysis

Speech Enhancement for Non-Stationary Noise Environments

Speech Enhancement Based On Analysis Synthesis Framework With Improved Pitch Estimation And Spectral Envelope Enhancement

Speech Enhancement Algorithm Based on Spectral Subtraction

Noise Estimation Using Mean Square Cross Prediction Error for Speech Enhancement

Speech Enhancement Based on Magnitude Estimation Using the Gamma Prior

Speech Enhancement Based on Analysis–Synthesis Framework with Improved Parameter Domain Enhancement

Speech Enhancement Based on Minimum Band Energy in Variable Noise-level Environments

Speech Enhancement Using Non-Negative Spectrogram Models With Mel-Generalized Cepstral Regularization

Enhancement Algorithm for Low Signal to Noise Ratio Speech

A Speech Enhancement Approach Using Piecewise Linear Approximation of an Explicit Model of Environmental Distortions

Exploring Conventional Enhancement and Separation Methods for Multi‐speech Enhancement in Indoor Environments

A generalized time-frequency subtraction method for robust speech enhancement based on wavelet filter banks modeling of human auditory system

Adaptive two-channel speech enhancement algorithm based on the modulation spectrum

Error Modeling Via Asymmetric Laplace Distribution for Deep Neural Network Based Single-Channel Speech Enhancement

Uncertainty Estimation in Deep Speech Enhancement Using Complex Gaussian Mixture Models

Incorporation of a modified temporal cepstrum smoothing in both signal-to-noise ratio and speech presence probability estimation for speech enhancement

Multichannel Speech Enhancement Based on Time-Frequency Masking Using Subband Long Short-Term Memory