Abstract:Deep neural network (DNN)-based approaches have been shown to be effective in many automatic speech recognition systems. However, few works have focused on DNNs for distant-talking speaker recognition. In this study, a bottleneck feature derived from a DNN and a cepstral domain denoising autoencoder (DAE)-based dereverberation are presented for distant-talking speaker identification, and a combination of these two approaches is proposed. For the DNN-based bottleneck feature, we noted that DNNs can transform the reverberant speech feature to a new feature space with greater discriminative classification ability for distant-talking speaker recognition. Conversely, cepstral domain DAE-based dereverberation tries to suppress the reverberation by mapping the cepstrum of reverberant speech to that of clean speech with the expectation of improving the performance of distant-talking speaker recognition. Since the DNN-based discriminant bottleneck feature and DAE-based dereverberation have a strong complementary nature, the combination of these two methods is expected to be very effective for distant-talking speaker identification. A speaker identification experiment was performed on a distant-talking speech set, with reverberant environments differing from the training environments. In suppressing late reverberation, our method outperformed some state-of-the-art dereverberation approaches such as the multichannel least mean squares (MCLMS). Compared with the MCLMS, we obtained a reduction in relative error rates of 21.4% for the bottleneck feature and 47.0% for the autoencoder feature. Moreover, the combination of likelihoods of the DNN-based bottleneck feature and DAE-based dereverberation further improved the performance.

Music removal by convolutional denoising autoencoder in speech recognition.

MusicECAN: An Automatic Denoising Network for Music Recordings With Efficient Channel Attention

Environmental Noise Reduction based on Deep Denoising Autoencoder

End-to-end Music-mixed Speech Recognition

Simultaneous Denoising and Dereverberation Using Deep Embedding Features

Unseen Noise Estimation Using Separable Deep Auto Encoder for Speech Enhancement

Autoencoders for music sound modeling: a comparison of linear, shallow, deep, recurrent and variational models

Music Auto-Tagging with Robust Music Representation Learned via Domain Adversarial Training

Contractive De-noising Auto-encoder

High-Fidelity Noise Reduction with Differentiable Signal Processing

Learning and Evaluating Musical Features with Deep Autoencoders

Audio-Based Music Classification with DenseNet And Data Augmentation

Improving Generation Performance of Speech Emotion Recognition by Denoising Autoencoders.

Control System and Speech Recognition of Exhibition Hall Digital Media Based on Computer Technology

Music emotion recognition using deep convolutional neural networks

Construction of AI Environmental Music Education Application Model Based on Deep Learning

Modeling of the Latent Embedding of Music using Deep Neural Network

Joint Training for Simultaneous Speech Denoising and Dereverberation with Deep Embedding Representations

A Joint Framework of Denoising Autoencoder and Generative Vocoder for Monaural Speech Enhancement

Deep Neural Network-Based Bottleneck Feature and Denoising Autoencoder-Based Dereverberation for Distant-Talking Speaker Identification.

Bidirectional Denoising Autoencoders-Based Robust Representation Learning for Underwater Acoustic Target Signal Denoising