Abstract:Deep neural network (DNN)-based approaches have been shown to be effective in many automatic speech recognition systems. However, few works have focused on DNNs for distant-talking speaker recognition. In this study, a bottleneck feature derived from a DNN and a cepstral domain denoising autoencoder (DAE)-based dereverberation are presented for distant-talking speaker identification, and a combination of these two approaches is proposed. For the DNN-based bottleneck feature, we noted that DNNs can transform the reverberant speech feature to a new feature space with greater discriminative classification ability for distant-talking speaker recognition. Conversely, cepstral domain DAE-based dereverberation tries to suppress the reverberation by mapping the cepstrum of reverberant speech to that of clean speech with the expectation of improving the performance of distant-talking speaker recognition. Since the DNN-based discriminant bottleneck feature and DAE-based dereverberation have a strong complementary nature, the combination of these two methods is expected to be very effective for distant-talking speaker identification. A speaker identification experiment was performed on a distant-talking speech set, with reverberant environments differing from the training environments. In suppressing late reverberation, our method outperformed some state-of-the-art dereverberation approaches such as the multichannel least mean squares (MCLMS). Compared with the MCLMS, we obtained a reduction in relative error rates of 21.4% for the bottleneck feature and 47.0% for the autoencoder feature. Moreover, the combination of likelihoods of the DNN-based bottleneck feature and DAE-based dereverberation further improved the performance.

Auditory Features For The Close Talk Speech Enhancement With Parameter Masks

Parameter Masks for Close Talk Speech Segregation Using Deep Neural Networks

Dual-microphone speech enhancement algorithm based on the auditory features for a close-talk system

A DNN Parameter Mask for the Binaural Reverberant Speech Segregation

A Dual Microphone Speech Enhancement Method With A Smoothing Parameter Mask

Energy Difference Based Speech Segregation for Close-Talk System

A Dual-Microphone Speech Enhancement Algorithm for Close-Talk System

An Auditory-Based Monaural Feature for Noisy and Reverberant Speech Enhancement

Using Energy Difference for Speech Separation of Dual-microphone Close-talk System

A Binaural Deep Neural Networks Parameter Mask for the Robust Automatic Speech Recognition System

Binaural Deep Neural Network for Robust Speech Enhancement

Auditory Feature for Monaural Speech Segregation

Using an Adjustment Training and a Smoothing Mask for Speech Segregation

Deep Neural Network Based Noised Asian Speech Enhancement and Its Implementation on a Hearing Aid App.

A Feature Integration Network for Multi-Channel Speech Enhancement

Masks Fusion with Multi-Target Learning For Speech Enhancement

Auditory Model Based Speech Feature Extraction and Its Application to Speaker Identification

Joint Noise and Mask Aware Training for DNN-based Speech Enhancement with SUB-band Features

A Lightweight and Real-Time Binaural Speech Enhancement Model with Spatial Cues Preservation

Deep Neural Network-Based Bottleneck Feature and Denoising Autoencoder-Based Dereverberation for Distant-Talking Speaker Identification.

Speech Enhancement Based on Binaural Sound Source Localization and Cosh Measure Wiener Filtering