Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification

Bing Jiang,Yan Song,Si Wei,Meng-Ge Wang,Ian McLoughlin,Li-Rong Dai
DOI: https://doi.org/10.1109/iscslp.2014.6936580
2014-01-01
Abstract:Our previous work has shown that Deep Bottleneck Features (DBF), generated from a well-trained Deep Neural Network (DNN), can provide high performance Language Identification (LID) when Total Variability (TV) modelling is used for a back-end. This may largely be attributed to the powerful capability of the DNN for finding a frame-level representation which is robust to variances caused by different speakers, channels and background noise. However the DBF in the previous work were extracted from a DNN that was trained using a large ASR dataset. Optimal LID DBF parameters may differ from those that are known to be optimal for ASR. Thus this paper focuses on investigating different DBF extractors, input layer window sizes and dimensionality, and bottleneck layer location. Additionally, principal component analysis (PCA) is used to decorrelate the DBF. Experiments, based on the Gaussian Mixture Model-Universal Background Model (GMM-UBM) operating on the NIST LRE 2009 database, are conducted to evaluate the system. Results allow comparison between different DBF extractor parameters, as well as demonstrating that LID based on DBF can significantly outperform the conventional shift delta cepstral (SDC) features.
What problem does this paper attempt to address?