Task-aware Deep Bottleneck Features for Spoken Language Identification
Bing Jiang,Yan Song,Si Wei,Ian McLoughlin,Li-Rong Dai
DOI: https://doi.org/10.21437/interspeech.2014-604
2014-01-01
Abstract:Recently, deep bottleneck features (DBF) extracted from a deep neural network (DNN) containing a narrow bottleneck layer, have been applied for language identification (LID), and yield significant performance improvement over state-of-the-art methods on NIST LRE 2009. However, the DNN is trained using a large corpus of specific language which is not directly related to the LID task. More recently, lattice based discriminative training methods for extracting more targeted DBF were proposed for ASR. Inspired by this, this paper proposes to tune the post-trained DNN parameters using an LID-specific training corpus, which may make the resulting DBF, termed a Discriminative DBF (D2BF), more discriminative and task-aware. Specifically, the maximum mutual information (MMI) criterion, with gradient descent, is applied to update the DNN parameters of the bottleneck layer in an iterative fashion. We evaluate the performance of the proposed D2BF using different back-end models, including GMM-MMI and ivector, over the most confused 6-languages selected from NIST LRE 2009. The results show that the proposed D2BF is more appropriate and effective than the original DBF. Index Terms: language identification, deep bottleneck feature, deep neural network, discriminative training, Gaussian mixture model, maximum mutual information