Abstract:Speech enhancement is an important preprocessing step in a wide diversity of practical fields related to speech signals, and many signal-processing methods have already been proposed for speech enhancement. However, the lack of a comprehensive and quantitative evaluation of enhancement performance for multi-speech makes it difficult to choose an appropriate enhancement method for a multi-speech application. This work aims to study the implementation of several enhancement methods for multi-speech enhancement in indoor environments of T60 = 0 s and T60 = 0.3 s. Two types of enhancement approaches are proposed and compared. The first type is the basic enhancement methods, including delay-and-sum beamforming (DSB), minimum variance distortionless response (MVDR), linearly constrained minimum variance (LCMV), and independent component analysis (ICA). The second type is the robust enhancement methods, including improved MVDR and LCMV realized by eigendecomposition and diagonal loading. In addition, online enhancement performance based on the iteration of single-frame speech signals is researched, as is the comprehensive performance of various enhancement methods. The experimental results show that the enhancement effects of LCMV and ICA are relatively more stable in the case of basic enhancement methods; in the case of the improved enhancement algorithms, methods that employ diagonal loading iterations show better performance. In terms of online enhancement, DSB with frequency masking (FM) yields the best performance on the signal-to-interference ratio (SIR) and can suppress interference. The comprehensive performance test showed that LCMV and ICA yielded the best effects when there was no reverberation, while DSB with FM yielded the best SIR value when reverberation was present.

Improving Monaural Speech Enhancement by Mapping to Fixed Simulation Space With Knowledge Distillation

Injecting Spatial Information for Monaural Speech Enhancement via Knowledge Distillation

SE Territory: Monaural Speech Enhancement Meets the Fixed Virtual Perceptual Space Mapping

Audio-Visual Speech Enhancement with Deep Multi-modality Fusion

A Lightweight and Real-Time Binaural Speech Enhancement Model with Spatial Cues Preservation

Sub-band Knowledge Distillation Framework for Speech Enhancement

Improving Dual-Microphone Speech Enhancement by Learning Cross-Channel Features with Multi-Head Attention

Sixty Years of Frequency-Domain Monaural Speech Enhancement: From Traditional to Deep Learning Methods

Neural Ambisonic Encoding For Multi-Speaker Scenarios Using A Circular Microphone Array

Exploring the Potential of Data-Driven Spatial Audio Enhancement Using a Single-Channel Model

Efficient Multi-Channel Speech Enhancement with Spherical Harmonics Injection for Directional Encoding

Real-time multichannel deep speech enhancement in hearing aids: Comparing monaural and binaural processing in complex acoustic scenarios

A Refining Underlying Information Framework for Monaural Speech Enhancement

Speech Enhancement with Perceptually-motivated Optimization and Dual Transformations

Innovative Directional Encoding in Speech Processing: Leveraging Spherical Harmonics Injection for Multi-Channel Speech Enhancement

Real-time Stereo Speech Enhancement with Spatial-Cue Preservation based on Dual-Path Structure

Exploring Conventional Enhancement and Separation Methods for Multi‐speech Enhancement in Indoor Environments

End-to-End Paired Ambisonic-Binaural Audio Rendering

Selective State Space Model for Monaural Speech Enhancement

On monoaural speech enhancement for automatic recognition of real noisy speech using mixture invariant training

Improving Speech Enhancement Using Audio Tagging Knowledge from Pre-Trained Representations and Multi-Task Learning