Abstract:Deep neural network (DNN)-based approaches have been shown to be effective in many automatic speech recognition systems. However, few works have focused on DNNs for distant-talking speaker recognition. In this study, a bottleneck feature derived from a DNN and a cepstral domain denoising autoencoder (DAE)-based dereverberation are presented for distant-talking speaker identification, and a combination of these two approaches is proposed. For the DNN-based bottleneck feature, we noted that DNNs can transform the reverberant speech feature to a new feature space with greater discriminative classification ability for distant-talking speaker recognition. Conversely, cepstral domain DAE-based dereverberation tries to suppress the reverberation by mapping the cepstrum of reverberant speech to that of clean speech with the expectation of improving the performance of distant-talking speaker recognition. Since the DNN-based discriminant bottleneck feature and DAE-based dereverberation have a strong complementary nature, the combination of these two methods is expected to be very effective for distant-talking speaker identification. A speaker identification experiment was performed on a distant-talking speech set, with reverberant environments differing from the training environments. In suppressing late reverberation, our method outperformed some state-of-the-art dereverberation approaches such as the multichannel least mean squares (MCLMS). Compared with the MCLMS, we obtained a reduction in relative error rates of 21.4% for the bottleneck feature and 47.0% for the autoencoder feature. Moreover, the combination of likelihoods of the DNN-based bottleneck feature and DAE-based dereverberation further improved the performance.

Deep Neural Network Derived Bottleneck Features For Accurate Audio Classification

Audio Sentiment Analysis by Heterogeneous Signal Features Learned from Utterance-Based Parallel Neural Network.

Semi-supervised Learning of Bottleneck Feature for Music Genre Classification.

A Deep Neural Network for Audio Classification with a Classifier Attention Mechanism

Speaker Recognition System Based on Deep Neural Networks and Bottleneck Features

Deep Neural Network-Based Bottleneck Feature and Denoising Autoencoder-Based Dereverberation for Distant-Talking Speaker Identification.

Music Feature Extraction and Classification Algorithm Based on Deep Learning

Adaptive DCTNet for Audio Signal Classification

Bipolar Population Threshold Encoding for Audio Recognition with Deep Spiking Neural Networks

Deep Neural Network Based Environment Sound Classification and Its Implementation on Hearing Aid App

Using Deep Belief Network to Capture Temporal Information for Audio Event Classification.

Improved Bottleneck Feature Using Hierarchical Deep Belief Networks for Keyword Spotting in Continues Speech

Deep Neural Network Based Noised Asian Speech Enhancement and Its Implementation on a Hearing Aid App.

Audio-Based Music Classification with DenseNet And Data Augmentation

Investigation Of Bottleneck Features And Multilingual Deep Neural Networks For Speaker Verification

Deep Speaker Feature Learning for Text-independent Speaker Verification

Improving Blstm Rnn Based Mandarin Speech Recognition Using Accent Dependent Bottleneck Features

A regression approach to speech enhancement based on deep neural networks

Advanced Framework for Animal Sound Classification With Features Optimization

Bottleneck Features Based On Gammatone Frequency Cepstral Coefficients

A Novel Pitch Extraction Based on Jointly Trained Deep BLSTM Recurrent Neural Networks with Bottleneck Features