Front-End Factor Analysis For Speaker Verification

Florin Curelaru
DOI: https://doi.org/10.1109/iccomm.2018.8453731
2018-06-01
Abstract:It is known that the performance of the i-vectors/PLDA based speaker verification systems is affected in the cases of short utterances and limited training data. The performance degradation appears because the shorter the utterance, the less reliable the extracted i-vector is, and because the total variability covariance matrix and the underlying PLDA matrices need a significant amount of data to be robustly estimated. Considering the “MIT Mobile Device Speaker Verification Corpus” (MIT-MDSVC) as a representative dataset for robust speaker verification tasks on limited amount of training data, this paper investigates which configuration and which parameters lead to the best performance of an i-vectors/PLDA based speaker verification. The i-vectors/PLDA based system achieved good performance only when the total variability matrix and the underlying PLDA matrices were trained with data belonging to the enrolled speakers. This way of training means that the system should be fully retrained when new enrolled speakers were added. The performance of the system was more sensitive to the amount of training data of the underlying PLDA matrices than to the amount of training data of the total variability matrix. Overall, the Equal Error Rate performance of the i-vectors/PLDA based system was around 1% below the performance of a GMM-UBM system on the chosen dataset. The paper presents at the end some preliminary experiments in which the utterances comprised in the CSTR VCTK corpus were used besides utterances from MIT-MDSVC for training the total variability covariance matrix and the underlying PLDA matrices.
What problem does this paper attempt to address?