Abstract:As automatic speech recognition evolves, deployment of the voice user interface (VUI) has boomingly expanded. Especially since the COVID-19 pandemic, the VUI has gained more attention in online communication owing to its non-contact property. However, the VUI struggles to be applied in public scenes due to the degradation of received audio signals caused by various ambient noises. In this article, we propose Wavoice , the first noise-resistant multi-modal speech recognition system that fuses two distinct voices sensing modalities (i.e., millimeter-wave signals and audio signals from a microphone) together. One key contribution is to model the inherent correlation between millimeter-wave and audio signals. Based on it, Wavoice facilitates the real-time noise-resistant voice activity detection and user targeting from multiple speakers. Additionally, we elaborate on two novel modules for multi-modal fusion embedded into the neural network, leading to accurate speech recognition. Extensive experiments prove the effectiveness of Wavoice under adverse conditions—that is, the character recognition error rate below 1% in a range of 7 m. In terms of robustness and accuracy, Wavoice considerably outperforms existing audio-only speech recognition methods with lower character error and word error rates.

Research of a non-specific person noise-robust speech recognition system

How Noise and Language Proficiency Influence Speech Recognition by Individual Non-Native Listeners.

Research on Speaker-Depended Isolated-Word Speech Recognition System

Design and implementation of a speaker recognition system

Real-time Speaker Recognition System for PDA

Wavoice: A mmWave-assisted Noise-resistant Speech Recognition SystemJust Accepted

A Noise Robust Front End Algorithm for Mandarin Speech Recognition and Performance Analysis

Speech Recognition System Based on CDHMM/SOFMNN in Noisy Environment

Speech Recognition System Based on SCHMM/ANN in Noisy Environment

Autoregressive Model-Based Robust Speech Recognition in Additive Noise Environment

Noise Estimation Using Mean Square Cross Prediction Error for Speech Enhancement

Wavoice: an Mmwave-Assisted Noise-Resistant Speech Recognition System

Wavoice: an Mmwave-Assisted Noise-Resistant Speech Recognition System.

Modified MFCCs for Robust Speaker Recognition

Research of Characteristic Parameters Extraction Based on Embedded Speech Recognition System

Design and implementation of speech recognition algorithm based on frequency range

Residual Noise Compensation For Robust Speech Recognition In Nonstationary Noise

VTS-based Robust Speech Recognition

Threshold-Based Noise Detection and Reduction for Automatic Speech Recognition System in Human-Robot Interactions

Two-stage Framework for Robust Speech Emotion Recognition Using Target Speaker Extraction in Human Speech Noise Conditions