Task-Driven Variability Model for Speaker Verification.

Chen,Jiqing Han
DOI: https://doi.org/10.1007/s00034-019-01315-7
IF: 2.311
2019-01-01
Circuits Systems and Signal Processing
Abstract:The total variability model (TVM)/probabilistic linear discriminant analysis (PLDA) framework is one of the most popular methods for speaker verification. In this framework, the i-vector representations are first extracted from utterances via an estimated TVM and then employed to estimate the PLDA parameters for classification. The TVM and PLDA are estimated serially, so the information loss in the TVM is inherited by the i-vectors, and then passed into the PLDA classifier. More seriously, this loss cannot be compensated by the PLDA. To solve this problem, we propose a task-driven variability model (TDVM) to jointly estimate the TVM and PLDA classifier. In this method, the feedback from the PLDA can supervise the optimal solution of the TVM to move toward the space that has the maximum between-class separation and minimum within-class variation. Meanwhile, this space is suitable for open-set test which can deal with unenrolled speakers. Unlike most embedding methods which extract the embedding representations via the stack of network structures, the TDVM contains the assumptions about latent variables, which can enhance the interpretation of speaker representation extraction. The proposed method is evaluated on the King-ASR-010 and VoxCeleb databases, and the experimental results show that the TDVM method can achieve better performance than the traditional TVM/PLDA and VGG-M network with different cost functions.
What problem does this paper attempt to address?