Abstract:Acoustic echo is a persistent issue in telecommunication that degrades the quality of speech and breaks down communication either entirely or for a period of time; therefore, acoustic echo cancellation (AEC) systems were developed. The demand for AEC has significantly risen after the global pandemic 2020 as the speaker and the listener communicate in unpredictable environments such as home environments where echo and noise significantly disrupt communication. Numerous AEC solutions have been proposed, including adaptive filters and deep learning techniques. However, their effectiveness is notably lowered during double-talk scenarios, where both nearend and farend speakers talk simultaneously, as well as in noisy environments. This paper proposes a novel transQT neural network (TNN), an end-to-end neural network that leverages the constant Q transform (CQT) and transformer-inspired self-attention module to eliminate the echo and noise in double-talk noisy scenarios. Additionally, it utilizes the smooth L1 loss function to enable efficient training and enhance the overall performance of the proposed model. In the proposed TNN, the CQT is used as the front end to convert the signal from time domain to time-frequency domain. The primary aim of CQT is to improve speech quality as it aligns more closely with the human auditory system due to its use of a logarithmic frequency scale. The attention module has been incorporated among the layers of the proposed models to focus on double-talk and noisy parts of speech. It aids the AEC model by making it easier to separate the clean target signal from the parts affected by double-talk and noise. The smooth L1 loss is employed to ensure smooth training and stable and efficient convergence. It is also less sensitive to variability in data, therefore reducing large errors and overall loss. An experimental implementation was conducted for both causal and non-causal scenarios. The proposed TNN model demonstrated superior performance in terms of speech quality, as measured by the perceptual evaluation of speech quality (PESQ) and it also showed a significant reduction of echo, quantified by echo return loss enhancement (ERLE). The performance was further evaluated using the correlation coefficient, which indicates the relationship between the clean and the echo signal.

Multi-objective Approach to Speech Enhancement Using Tunable Q-Factor-based Wavelet Transform and ANN Techniques

Speech Intelligibility Based Enhancement System Using Modified Deep Neural Network and Adaptive Multi-band Spectral Subtraction

Maximum likelihood based estimation with quasi oppositional chemical reaction optimization algorithm for speech signal enhancement

Noise reduction using wavelet thresholding of multitaper estimators and geometric approach to spectral subtraction for speech coding strategy

Dual-Stage Low-Complexity Reconfigurable Speech Enhancement

An approach for speech enhancement with dysarthric speech recognition using optimization based machine learning frameworks

Speech enhancement based on stationary bionic wavelet transform and maximum a posterior estimator of magnitude-squared spectrum

Speech Enhancement Based on Reducing the Detail Portion of Speech Spectrograms in Modulation Domain via Discrete Wavelet Transform

Novel TransQT Neural Network: A Deep Learning Framework for Acoustic Echo Cancellation in Noisy Double-Talk Scenario

Feature Extraction and Classification of Power Quality Disturbances Using Optimized Tunable-Q Wavelet Transform and Incremental Support Vector Machine

Multi-Objective Learning and Mask-Based Post-Processing for Deep Neural Network Based Speech Enhancement

A Multiobjective Learning and Ensembling Approach to High-Performance Speech Enhancement with Compact Neural Network Architectures

End-to-End Multi-Task Denoising for joint SDR and PESQ Optimization

Multi-stage Progressive Learning-Based Speech Enhancement Using Time–Frequency Attentive Squeezed Temporal Convolutional Networks

An Improved Speech Enhancement Algorithm Based on Wavelet Transform

Adaptive Weiner filtering with AR-GWO based optimized fuzzy wavelet neural network for enhanced speech enhancement

A study on attention-based objective function in deep denoising autoencoder based speech enhancement

End-to-End Multi-Task Denoising for the Joint Optimization of Perceptual Speech Metrics

Speech Enhancement with Perceptually-motivated Optimization and Dual Transformations

A Refining Underlying Information Framework for Monaural Speech Enhancement

A Speech Enhancement Algorithm Based on Computational Auditory Scene Analysis