Abstract:The presence of background noise or competing talkers is one of the main communication challenges for cochlear implant (CI) users in speech understanding in naturalistic spaces. These external factors distort the time-frequency (T-F) content including magnitude spectrum and phase of speech signals. While most existing speech enhancement (SE) solutions focus solely on enhancing the magnitude response, recent research highlights the importance of phase in perceptual speech quality. Motivated by multi-task machine learning, this study proposes a deep complex convolution transformer network (DCCTN) for complex spectral mapping, which simultaneously enhances the magnitude and phase responses of speech. The proposed network leverages a complex-valued U-Net structure with a transformer within the bottleneck layer to capture sufficient low-level detail of contextual information in the T-F domain. To capture the harmonic correlation in speech, DCCTN incorporates a frequency transformation block in the encoder structure of the U-Net architecture. The DCCTN learns a complex transformation matrix to accurately recover speech in the T-F domain from a noisy input spectrogram. Experimental results demonstrate that the proposed DCCTN outperforms existing model solutions such as the convolutional recurrent network (CRN), deep complex convolutional recurrent network (DCCRN), and gated convolutional recurrent network (GCRN) in terms of objective speech intelligibility and quality, both for seen and unseen noise conditions. To evaluate the effectiveness of the proposed SE solution, a formal listener evaluation involving four CI recipients was conducted. Results indicate a significant improvement in speech intelligibility performance for CI recipients in noisy environments. Additionally, DCCTN demonstrates the capability to suppress highly non-stationary noise without introducing musical artifacts commonly observed in conventional SE methods.

A Fully Convolutional Neural Network for Speech Enhancement

Environment-Dependent Attention-Driven Recurrent Convolutional Neural Network for Robust Speech Enhancement

An RNN-based Speech Enhancement Method for a Binaural Hearing Aid System

A Real-Time Speech Enhancement Algorithm Based on Convolutional Recurrent Network and Wiener Filter

Efficient Gated Convolutional Recurrent Neural Networks for Real-Time Speech Enhancement

Redundant Convolutional Network with Attention Mechanism for Monaural Speech Enhancement.

Deep Neural Network Based Noised Asian Speech Enhancement and Its Implementation on a Hearing Aid App.

Convolutional Recurrent Neural Network with Attention for 3D Speech Enhancement

Speech Enhancement Algorithm Based on Microphone Array and Lightweight CRN for Hearing Aid

Shortcut-Based Fully Convolutional Network for Speech Enhancement

Hybrid Dilated and Recursive Recurrent Convolution Network for Time-Domain Speech Enhancement

Speech-enhanced and Noise-aware Networks for Robust Speech Recognition

Auditory filterbank denoising neural network for speech enhancement in wearable auditory device

Speech Enhancement for Cochlear Implant Recipients using Deep Complex Convolution Transformer with Frequency Transformation

CFTNet: Complex-valued Frequency Transformation Network for Speech Enhancement

End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement

Speech enhancement using progressive learning-based convolutional recurrent neural network

Supervised Attention Multi-Scale Temporal Convolutional Network for monaural speech enhancement

AMRConvNet: AMR-Coded Speech Enhancement Using Convolutional Neural Networks

Single Channel Speech Enhancement Using Temporal Convolutional Recurrent Neural Networks.

Efficient High-Performance Bark-Scale Neural Network for Residual Echo and Noise Suppression