Abstract:Currently, phonotactic spoken language recognition (SLR) and acoustic SLR systems are widely used language recognition systems. Parallel phone recognition followed by vector space modeling (PPRVSM) is one typical phonotactic system for spoken language recognition. To achieve better performance, researchers assumed to extract more complementary information of the training data using phone recognizers trained for multiple language-specific phone recognizers, different acoustic models and acoustic features. These methods achieve good performance but usually compute at high computational cost and only using complementary information of the training data. In this paper, we explore a novel approach to discriminative vector space model (VSM) training by using a boosting framework to use the discriminative information of test data effectively, in which an ensemble of VSMs is trained sequentially. The effectiveness of our boosting variation comes from the emphasis on working with the high confidence test data to achieve discriminatively trained models. Our variant of boosting also includes utilizing original training data in VSM training. The discriminative boosting algorithm (DBA) is applied to the National Institute of Standards and Technology (NIST) language recognition evaluation (LRE) 2009 task and show performance improvements. The experimental results demonstrate that the proposed DBA shows 1.8 %, 11.72 % and 15.35 % relative reduction for 30s, 10s and 3s test utterances in equal error rate (EER) than baseline system.

Phonetic Speaker Recognition Using Maximum-Likelihood Binary-Decision Tree Models

Decision tree dynamic pruning method based on minimum description length in speech recognition

Speaker‐independent Phoneme Recognition Using Hidden Markov Models

Lattice Based Discriminative Model Combination Using Automatically Induced Phonetic Contexts.

Improved DNN-HMM English Acoustic Model Specially for Phonotactic Language Recognition

Acoustic Modeling Based On Chinese Phonetics Knowledge

Improved Phonotactic Language Recognition Using Collaborated Language Model.

Combining Cross-Stream and Time Dimensions in Phonetic Speaker Recognition.

Researching of Speech Recognition Oriented Mongolian Acoustic Model

Research on Context-Dependent Acoustical Unit (Triphone) for Mandarin Continuous Speech Recognition

Improved Phonotactic Language Recognition Based on RNN Feature Reconstruction

Hidden Markov Acoustic Modeling with Bootstrap and Restructuring for Low-Resourced Languages

Modeling long temporal contexts for robust DNN-based speech recognition

Phonotactic language recognition based on DNN-HMM acoustic model

Decision tree based state tying for speech recognition using DNN derived embeddings

Automatic Context Induction for Tone Model Integration in Mandarin Speech Recognition

Parallel Phone Recognizer Based MLLR Speaker Recognition

Maximum Likelihood Subband Polynomial Regression for Robust Speech Recognition

COLLABORATED FEATURE SUPERVECTOR AND COLLABORATED LANGUAGE MODEL FOR PHONOTACTIC LANGUAGE RECOGNITION

PHONEME BASED NEURAL TRANSDUCER FOR LARGE VOCABULARY SPEECH RECOGNITION

Discriminative Boosting Algorithm for Diversified Front-End Phonotactic Language Recognition