Abstract:The presence of a large amount of echoes significantly impairs the quality and intelligibility of speech during communication. To address this issue, numerous studies and models have been conducted to cancel echo. In this study, we propose a multi-stage acoustic echo cancellation model that utilizes an adaptive filter and a deep neural network. Our model consists of two parts: the Speex algorithm for canceling linear echo, and the multi-scale time-frequency UNet (MSTFUNet) for further echo cancellation. The Speex algorithm takes the far-end reference speech and the near-end microphone signal as inputs, and outputs the signal after linear echo cancellation. MSTFUNet takes the spectra of the far-end reference speech, the near-end microphone signal, and the output of Speex as inputs, and generates the estimated near-end speech spectrum as output. To enhance the performance of the Speex algorithm, we conduct delay estimation and compensation to the far-end reference speech. For MSTFUNet, we employ multi-scale time-frequency processing to extract information from the input spectrum. Additionally, we incorporate an improved time-frequency self-attention to capture time-frequency information. Furthermore, we introduce channel time-frequency attention to alleviate information loss during downsampling and upsampling. In our experiments, we evaluate the performance of our proposed model on both our test set and the blind test set of the Acoustic Echo Cancellation challenge. Our proposed model exhibits superior performance in terms of acoustic echo cancellation and noise reverberation suppression compared to other models.

iPhonMatchNet: Zero-Shot User-Defined Keyword Spotting Using Implicit Acoustic Echo Cancellation

PhonMatchNet: Phoneme-Guided Zero-Shot Keyword Spotting for User-Defined Keywords

Keyword Spotting Based on Phoneme Confusion Matrix

Data Augmentation for Robust Keyword Spotting under Playback Interference

Broadcasted Residual Learning for Efficient Keyword Spotting

A New Keyword Spotting Approach for Spontaneous Mandarin Speech

Non-uniform MCE Based Acoustic Model for Keyword Spotting based on Deep Neural Network

NEC: Speaker Selective Cancellation via Neural Enhanced Ultrasound Shadowing

Bridging the Gap Between Audio and Text Using Parallel-Attention for User-Defined Keyword Spotting

Keyword Spotting for Hearing Assistive Devices Robust to External Speakers

Keyword-specific normalization based keyword spotting for spontaneous speech

Focal Loss And Double-Edge-Triggered Detector For Robust Small-Footprint Keyword Spotting

A Multi-Stage Acoustic Echo Cancellation Model Based on Adaptive Filters and Deep Neural Networks

Model compression applied to small-footprint keyword spotting

A Two-Step Keyword Spotting Method Using Fuzzy Search Algorithm

Keyword Spotting Based on Syllable Confusion Network.

Phoneme-Based Proactive Anti-Eavesdropping with Controlled Recording Privilege

SLiCK: Exploiting Subsequences for Length-Constrained Keyword Spotting

Spot keywords from very noisy and mixed speech

Icassp 2021 acoustic echo cancellation challenge: integrated adaptive echo cancellation with time alignment and deep learning-based residual echo plus noise suppression