Abstract:We propose a novel speaker-dependent speech separation framework for the challenging CHiME-5 acoustic environments, exploiting advantages of both deep learning based and conventional preprocessing techniques to prepare data effectively for separating target speech from multi-talker mixed speech collected with multiple microphone arrays. First, a series of multi-channel operations is conducted to reduce existing reverberation and noise, and a single-channel deep learning based speech enhancement model is used to predict speech presence probabilities. Next, a two-stage supervised speech separation approach, using oracle speaker diarization information from CHiME-5, is proposed to separate speech of a target speaker from interference speakers in mixed speech. Given a set of three estimated masks of the background noise, the target speaker and the interference speakers from single-channel speech enhancement and separation models, a complex Gaussian mixture model based generalized eigenvalue beam-former is then used for enhancing the signal at the reference array while avoiding the speaker permutation issue. Furthermore, the proposed front-end can generate a large variety of processed data for an ensemble of speech recognition results. Experiments on the development set have shown that the proposed two-stage approach can yield significant improvements of recognition performance over the official baseline system and achieved top accuracies in all four competing evaluation categories among all systems submitted to the CHiME-5 Challenge.

Robust Front-End for Speech Recognition Based on Computational Auditory Scene Analysis and Speaker Model

CASA Based Speech Separation for Robust Speech Recognition

Cross-modal Mask Fusion and Modality-Balanced Audio-Visual Speech Recognition

Design and implementation of a speaker recognition system

CASA Based Speech Separation for

A Speech Enhancement Algorithm Using Computational Auditory Scene Analysis with Spectral Subtraction

Wavoice: A mmWave-assisted Noise-resistant Speech Recognition SystemJust Accepted

Wavoice: an Mmwave-Assisted Noise-Resistant Speech Recognition System.

Wavoice: an Mmwave-Assisted Noise-Resistant Speech Recognition System

A Speech Enhancement Algorithm Based on Computational Auditory Scene Analysis

Robust speaker recognition using glottal information‐based cepstral mean subtraction

Using an Adjustment Training and a Smoothing Mask for Speech Segregation

A Dual-Microphone Speech Enhancement Algorithm for Close-Talk System

A Dual Microphone Speech Enhancement Method With A Smoothing Parameter Mask

Flexible Multichannel Speech Enhancement for Noise-Robust Frontend

Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition

Robust Audio-Visual Speech Enhancement: Correcting Misassignments in Complex Environments with Advanced Post-Processing

Robust speech recognition in noisy backgrounds based on Teager energy operator and auditory process

Compensation of Speech Enhancement Distortion for Robust Speech Recognition

A Speaker-Dependent Approach to Separation of Far-Field Multi-Talker Microphone Array Speech for Front-End Processing in the CHiME-5 Challenge

A Noise Robust Front End Algorithm for Mandarin Speech Recognition and Performance Analysis