Multi-view (Joint) probability linear discrimination analysis for J-vector based text dependent speaker verification

Ziqiang Shi,L. Liu,Mengjiao Wang,Rujie Liu
DOI: https://doi.org/10.1109/ASRU.2017.8268993
2017-12-01
Abstract:J-vector has been proved to be very effective in text dependent speaker verification with short-duration speech. However, the current back-end classifiers cannot make full use of such deep features. In this paper, we propose a method to model the multi-faceted information in the j-vector explicitly and jointly. Examples of the multi-faceted information include speaker identity and text content. In our approach, the j-vector was modeled as a result derived by a generative multi-view (joint1) Probability Linear Discriminant Analysis (PLDA) model, which contains multiple kinds of latent variables. The usual PLDA model only considers one single label. However, in practical use, when using multi-task learned network as feature extractor, the extracted feature are always associated with several labels. This type of feature is called multi-view deep feature (e.g. j-vector). With multi-view (joint) PLDA, we are able to explicitly build a model that can combine multiple heterogeneous information from the j-vectors. In verification step, we calculated the likelihood to describe whether the two j-vectors having consistent labels or not. This likelihood is used in the following decision-making. Experiments have been conducted on large scale data corpus of different languages. On the public RSR2015 data corpus, the results showed that our approach can achieve 0.02% EER and 0.09% EER for impostor wrong and impostor correct cases respectively.
What problem does this paper attempt to address?