Abstract:Recently, the integration of deep neural networks (DNNs) with i-vector systems is proved to be effective for speaker verification. This method uses the DNN with senone outputs to produce frame alignments for sufficient statistics extraction. However, two types of data mismatch may degrade the performance of the DNN-based speaker verification systems. First, the DNN requires transcribed training data, while the data sets used for i-vector training and extraction are mostly untranscribed. Second, the language of the training data for DNN is limited by the pronunciation lexicon, making the model unsuitable for multilingual tasks. In this paper, we propose to use bottleneck features and multilingual DNNs to narrow the gap caused by the data mismatch. In our method, a DNN is first trained with senone labels to extract bottleneck features. Then a Gaussian mixture model (GMM) is trained with the bottleneck features to produce frame alignments. Additionally, bottleneck features based on multilingual DNNs are explored for multilingual speaker verification. Experiments on the NIST SRE 2008 female short2-short3 telephone task (multilingual) and the NIST SRE 2010 female core-extended telephone task (English) demonstrate the effectiveness of the proposed method.

Speaker Identification Using Wavelet Shannon Entropy and Probabilistic Neural Network

Speaker Recognition on Mobile Phone: Using Wavelet, Cepstral Coefficients and Probabilisitc Neural Network

Speaker Recognition Using Wavelet Packet Entropy, I-Vector, and Cosine Distance Scoring.

Forensic speaker recognition model using wavelet cepstral coefficients and probabilistic neural network

Speaker Identification System Based on Hybrid Neural Network

Speaker Recognition Using Wavelet Cepstral Coefficient, I-Vector, and Cosine Distance Scoring and Its Application for Forensics.

Speaker Identification Based on Classify Feature Sub-space Gaussian Mixture Model and Neural Net Fusion

Robust Speaker Identification Using An Auditory-Based Feature

Hierarchical Speaker Identification under Noisy Environments

Hybrid Architecture Based on Fuzzy Classifier and Multiplayer Feed-Forward Neural Network for Speaker Identification

Emotional speaker recognition based on similar neighbor phenomenon

Speaker Recognition System Based on Deep Neural Networks and Bottleneck Features

Wavelet-Based Mel-Frequency Cepstral Coefficients for Speaker Identification using Hidden Markov Models

Identity Vector Extraction by Perceptual Wavelet Packet Entropy and Convolutional Neural Network for Voice Authentication

Robust Speaker Identification In Noise Using Missing Data Technique And Auditory Masking

Speaker Recognition with Little Data Based on Fuzzy Kernel Entropy

Improving Noise Robustness In Speaker Identification Using A Two-Stage Attention Model

Speaker Identification based on LSP and Gaussian Mixture Model

Investigation Of Bottleneck Features And Multilingual Deep Neural Networks For Speaker Verification

Speaker identification and localization using shuffled MFCC features and deep learning

Speaker Identification from emotional and noisy speech data using learned voice segregation and Speech VGG