Abstract:It is known that the performance of the i-vectors/PLDA based speaker verification systems is affected in the cases of short utterances and limited training data. The performance degradation appears because the shorter the utterance, the less reliable the extracted i-vector is, and because the total variability covariance matrix and the underlying PLDA matrices need a significant amount of data to be robustly estimated. Considering the “MIT Mobile Device Speaker Verification Corpus” (MIT-MDSVC) as a representative dataset for robust speaker verification tasks on limited amount of training data, this paper investigates which configuration and which parameters lead to the best performance of an i-vectors/PLDA based speaker verification. The i-vectors/PLDA based system achieved good performance only when the total variability matrix and the underlying PLDA matrices were trained with data belonging to the enrolled speakers. This way of training means that the system should be fully retrained when new enrolled speakers were added. The performance of the system was more sensitive to the amount of training data of the underlying PLDA matrices than to the amount of training data of the total variability matrix. Overall, the Equal Error Rate performance of the i-vectors/PLDA based system was around 1% below the performance of a GMM-UBM system on the chosen dataset. The paper presents at the end some preliminary experiments in which the utterances comprised in the CSTR VCTK corpus were used besides utterances from MIT-MDSVC for training the total variability covariance matrix and the underlying PLDA matrices.

Speaker Clustering of Telephone Speech Based on Front-End Factor Analysis

An Investigation of Speaker Clustering Algorithms in Adverse Acoustic Environments

Factor Analysis and Space Assembling in Speaker Recognition

Emotional Speech Clustering Based Robust Speaker Recognition System

Eigenvoice Factor Analysis in Short Time Speaker Recognition

Speaker Verification Based on Factor Analysis and SVM

Factor analysis method for text-independent speaker identification.

Factor analysis based spatial correlation modeling for speaker verification

Simplified factor analysis in speaker verification

Speaker Adaptation for Telephony Data Using Speaker Clustering

Speaker Clustering Algorithm in Speech Recognition

Multi-speaker Segmentation and Clustering of Telephone Speech

A new DP-like speaker clustering algorithm

Speech recognition adaptive clustering feature extraction algorithms based on the k-means algorithm and the normalized intra-class variance

Speaker Segmentation and Clustering Based on the Improved Spectral Clustering

Intra-conversation intra-speaker variability compensation for speaker clustering

Front-End Factor Analysis For Speaker Verification

UBM Based Speaker Segmentation and Clustering for 2-Speaker Detection

Factored covariance modeling for text-independent speaker verification

Novel Speaker Recognition Method Based on Little Speech Data

Factor Analysis in GMM-Based Language Identification