Abstract:Supervised models for speech enhancement are trained using artificially generated mixtures of clean speech and noise signals. However, the synthetic training conditions may not accurately reflect real-world conditions encountered during testing. This discrepancy can result in poor performance when the test domain significantly differs from the synthetic training domain. To tackle this issue, the UDASE task of the 7th CHiME challenge aimed to leverage real-world noisy speech recordings from the test domain for unsupervised domain adaptation of speech enhancement models. Specifically, this test domain corresponds to the CHiME-5 dataset, characterized by real multi-speaker and conversational speech recordings made in noisy and reverberant domestic environments, for which ground-truth clean speech signals are not available. In this paper, we present the objective and subjective evaluations of the systems that were submitted to the CHiME-7 UDASE task, and we provide an analysis of the results. This analysis reveals a limited correlation between subjective ratings and several supervised nonintrusive performance metrics recently proposed for speech enhancement. Conversely, the results suggest that more traditional intrusive objective metrics can be used for in-domain performance evaluation using the reverberant LibriCHiME-5 dataset developed for the challenge. The subjective evaluation indicates that all systems successfully reduced the background noise, but always at the expense of increased distortion. Out of the four speech enhancement methods evaluated subjectively, only one demonstrated an improvement in overall quality compared to the unprocessed noisy speech, highlighting the difficulty of the task. The tools and audio material created for the CHiME-7 UDASE task are shared with the community.

Acoustic Model Ensembling Using Effective Data Augmentation for CHiME-5 Challenge.

Acoustic Modeling for Multi-Array Conversational Speech Recognition in the Chime-6 Challenge

Acoustic modeling for Overlapping Speech Recognition: JHU Chime-5 Challenge System

Space-and-speaker-aware Acoustic Modeling with Effective Data Augmentation for Recognition of Multi-Array Conversational Speech

A Speaker-Dependent Approach to Separation of Far-Field Multi-Talker Microphone Array Speech for Front-End Processing in the CHiME-5 Challenge

A Two-stage Single-channel Speaker-dependent Speech Separation Approach for Chime-5 Challenge.

An analysis of environment, microphone and data simulation mismatches in robust speech recognition

On Design of Robust Deep Models for CHiME-4 Multi-Channel Speech Recognition with Multiple Configurations of Array Microphones

The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6 Challenge

DCASE 2018 Challenge: Solution for Task 5

Building state-of-the-art distant speech recognition using the CHiME-4 challenge with a setup of speech enhancement baseline

The USTC-iFlytek Systems for CHiME-5 Challenge

Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training

Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge

A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection

Integrating the Data Augmentation Scheme with Various Classifiers for Acoustic Scene Modeling

Bridging the Gap Between Monaural Speech Enhancement and Recognition With Distortion-Independent Acoustic Modeling

An information fusion approach to recognizing microphone array speech in the CHiME-3 challenge based on a deep learning framework

A Space-and-Speaker-Aware Iterative Mask Estimation Approach to Multi-Channel Speech Recognition in the CHiME-6 Challenge.

Channel selection using neural network posterior probability for speech recognition with distributed microphone arrays in everyday environments

The CHiME-7 UDASE task: Unsupervised domain adaptation for conversational speech enhancement