Abstract:Deep neural network (DNN)-based approaches have been shown to be effective in many automatic speech recognition systems. However, few works have focused on DNNs for distant-talking speaker recognition. In this study, a bottleneck feature derived from a DNN and a cepstral domain denoising autoencoder (DAE)-based dereverberation are presented for distant-talking speaker identification, and a combination of these two approaches is proposed. For the DNN-based bottleneck feature, we noted that DNNs can transform the reverberant speech feature to a new feature space with greater discriminative classification ability for distant-talking speaker recognition. Conversely, cepstral domain DAE-based dereverberation tries to suppress the reverberation by mapping the cepstrum of reverberant speech to that of clean speech with the expectation of improving the performance of distant-talking speaker recognition. Since the DNN-based discriminant bottleneck feature and DAE-based dereverberation have a strong complementary nature, the combination of these two methods is expected to be very effective for distant-talking speaker identification. A speaker identification experiment was performed on a distant-talking speech set, with reverberant environments differing from the training environments. In suppressing late reverberation, our method outperformed some state-of-the-art dereverberation approaches such as the multichannel least mean squares (MCLMS). Compared with the MCLMS, we obtained a reduction in relative error rates of 21.4% for the bottleneck feature and 47.0% for the autoencoder feature. Moreover, the combination of likelihoods of the DNN-based bottleneck feature and DAE-based dereverberation further improved the performance.

Speaker Segmentation Based on Sparse Neural Network

Speaker Segmentation and Clustering Based on the Improved Spectral Clustering

Bipolar Population Threshold Encoding for Audio Recognition with Deep Spiking Neural Networks

Self-attention Based Speaker Recognition Using Cluster-Range Loss

Deep Neural Network-Based Bottleneck Feature and Denoising Autoencoder-Based Dereverberation for Distant-Talking Speaker Identification.

Speaker Recognition System Based on Deep Neural Networks and Bottleneck Features

SpatialNet: Extensively Learning Spatial Information for Multichannel Joint Speech Separation, Denoising and Dereverberation

Speaker Segmentation Using Deep Speaker Vectors For Fast Speaker Change Scenarios

Towards Ultra-Low-Power Neuromorphic Speech Enhancement with Spiking-FullSubNet

Speaker Recognition Based on Pre-Trained Model and Deep Clustering

A Unified Speaker-Dependent Speech Separation and Enhancement System Based on Deep Neural Networks.

A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech

Speaker Classification Algorithm Based on Spatial Acoustic Feature

Neuron Sparseness Versus Connection Sparseness in Deep Neural Network for Large Vocabulary Speech Recognition

Speech separation of a target speaker based on deep neural networks

3S-TSE: Efficient Three-Stage Target Speaker Extraction for Real-Time and Low-Resource Applications

A Speaker-Dependent Deep Learning Approach to Joint Speech Separation and Acoustic Modeling for Multi-Talker Automatic Speech Recognition

Spectral Conversion Using Deep Neural Networks Trained with Multi-Source Speakers

Neural Scoring, Not Embedding: A Novel Framework for Robust Speaker Verification

Speaker Identification System Based on Hybrid Neural Network

Few-Shot Speaker Identification Using Lightweight Prototypical Network With Feature Grouping and Interaction