Abstract:Deep neural network (DNN)-based approaches have been shown to be effective in many automatic speech recognition systems. However, few works have focused on DNNs for distant-talking speaker recognition. In this study, a bottleneck feature derived from a DNN and a cepstral domain denoising autoencoder (DAE)-based dereverberation are presented for distant-talking speaker identification, and a combination of these two approaches is proposed. For the DNN-based bottleneck feature, we noted that DNNs can transform the reverberant speech feature to a new feature space with greater discriminative classification ability for distant-talking speaker recognition. Conversely, cepstral domain DAE-based dereverberation tries to suppress the reverberation by mapping the cepstrum of reverberant speech to that of clean speech with the expectation of improving the performance of distant-talking speaker recognition. Since the DNN-based discriminant bottleneck feature and DAE-based dereverberation have a strong complementary nature, the combination of these two methods is expected to be very effective for distant-talking speaker identification. A speaker identification experiment was performed on a distant-talking speech set, with reverberant environments differing from the training environments. In suppressing late reverberation, our method outperformed some state-of-the-art dereverberation approaches such as the multichannel least mean squares (MCLMS). Compared with the MCLMS, we obtained a reduction in relative error rates of 21.4% for the bottleneck feature and 47.0% for the autoencoder feature. Moreover, the combination of likelihoods of the DNN-based bottleneck feature and DAE-based dereverberation further improved the performance.

A Spectral Masking Approach to Noise-Robust Speech Recognition Using Deep Neural Networks

Robust Speech Recognition With Speech Enhanced Deep Neural Networks

Joint Training Of Front-End And Back-End Deep Neural Networks For Robust Speech Recognition

Noise Robust Speech Recognition on Aurora4 by Humans and Machines.

Improving Robustness of Deep Neural Network Acoustic Models via Speech Separation and Joint Adaptive Training

Joint Noise and Mask Aware Training for DNN-based Speech Enhancement with SUB-band Features

Dynamic noise aware training for speech enhancement based on deep neural networks.

A Binaural Deep Neural Networks Parameter Mask for the Robust Automatic Speech Recognition System

Masking and Inpainting: A Two-Stage Speech Enhancement Approach for Low SNR and Non-Stationary Noise

Modelling human speech recognition in challenging noise maskers using machine learning

Speech Separation Based on Signal-Noise-dependent Deep Neural Networks for Robust Speech Recognition

Very Deep Convolutional Neural Networks for Robust Speech Recognition

Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training

Deep Neural Network-Based Bottleneck Feature and Denoising Autoencoder-Based Dereverberation for Distant-Talking Speaker Identification.

A Mask Free Neural Network for Monaural Speech Enhancement

Joint Training for Simultaneous Speech Denoising and Dereverberation with Deep Embedding Representations

A hybrid discriminant fuzzy DNN with enhanced modularity bat algorithm for speech recognition

Spectral Masking With Explicit Time-Context Windowing for Neural Network-Based Monaural Speech Enhancement

Revisiting Acoustic Features for Robust ASR

A Spectral-change-aware Loss Function for DNN-based Speech Separation.