Abstract:Phase information has a significant impact on speech perceptual quality and intelligibility. However, existing speech enhancement methods encounter limitations in explicit phase estimation due to the non-structural nature and wrapping characteristics of the phase, leading to a bottleneck in enhanced speech quality. To overcome the above issue, in this paper, we proposed MP-SENet, a novel Speech Enhancement Network that explicitly enhances Magnitude and Phase spectra in parallel. The proposed MP-SENet comprises a Transformer-embedded encoder-decoder architecture. The encoder aims to encode the input distorted magnitude and phase spectra into time-frequency representations, which are further fed into time-frequency Transformers for alternatively capturing time and frequency dependencies. The decoder comprises a magnitude mask decoder and a phase decoder, directly enhancing magnitude and wrapped phase spectra by incorporating a magnitude masking architecture and a phase parallel estimation architecture, respectively. Multi-level loss functions explicitly defined on the magnitude spectra, wrapped phase spectra, and short-time complex spectra are adopted to jointly train the MP-SENet model. A metric discriminator is further employed to compensate for the incomplete correlation between these losses and human auditory perception. Experimental results demonstrate that our proposed MP-SENet achieves state-of-the-art performance across multiple speech enhancement tasks, including speech denoising, dereverberation, and bandwidth extension. Compared to existing phase-aware speech enhancement methods, it further mitigates the compensation effect between the magnitude and phase by explicit phase estimation, elevating the perceptual quality of enhanced speech.

Speech Enhancement Method with Geometric Phase Estimation by Incorporating MIXMAX Model.

Speech Enhancement Using Modified MMSE-LSA and Phase Reconstruction in Voiced and Unvoiced Speech

A Speech Enhancement Method Based on Dual-Path Phase-Aware GAN Networks

Speech Enhancement Algorithm Based on Super-Gaussian Mixture Model of Speech Spectral Amplitude Distribution

Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement

A Modified Speech Enhancement Algorithm Using a Universal Speaker Model

Speech Enhancement with Gamma Speech Modeling

Mask Estimation Incorporating Phase-Sensitive Information for Speech Enhancement

Speech Enhancement by Short-Time Spectrum Estimation with Multivariate Laplace Speech Model

Speech Enhancement Based on Magnitude Estimation Using the Gamma Prior

Speech Enhancement Based on Phase Space Reconstruction

Speech Enhancement Using Magnitude and Phase Spectrum Compensation

Speech Enhancement for Nonstationary Noise Environments

lmproved speech enhancement algorithm using Bayesian nonnegative matrix factorization

Uncertainty Estimation in Deep Speech Enhancement Using Complex Gaussian Mixture Models

Gmm-Based A Priori Snr Estimation In Speech Enhancement

Speech Enhancement Based on Masking Properties and Short-Time Spectral Amplitude Estimation

Speech Enhancement Based on Laplacian-Gaussian Model and Simplified Phase Discrimination in Discrete Cosine Transform Domain

Speech enhancement by short-time spectrum estimation with multivariate laplace speech model | Wieloczynnikowy model mowy laplace'a w estymatorze spektrum krótkookresowego, na potrzeby polepszenia dźwie{ogonek}ku

Speech Enhancement Algorithm Based on Spectral Subtraction

Speech Enhancement Using A Modulation Domain Kalman Filter Post-Processor With A Gaussian Mixture Noise Model