Abstract:One typical phonotactic system for language recognition is parallel phone recognition followed by vector space modeling (PPRVSM). In this system, various phone recognizers are applied in parallel and fused at the score level. Each phone recognizer is trained for a known language, which is assumed to extract complementary information for effective fusion. But this method is limited by the large amount of training samples for which word or phone level transcription is required. Also, score fusion is not the optimal method as fusion at the feature or model level will retain more information than at the score level. This paper presents a new strategy to build and fuse parallel phone recognizers (PPR). This is achieved by training multiple acoustic diversified phone recognizers and fusing at the feature level. The phone recognizers are trained on the same speech data but using different acoustic features and model training techniques. For the acoustic features, Mel-frequency cepstral coefficients (MFCC) and perceptual linear prediction (PLP) are both employed. In addition, a new time-frequency cepstrum (TFC) feature is proposed to extract complementary acoustic information. For the model training, we examine the use of the maximum likelihood and feature minimum phone error methods to train complementary acoustic models. In this study, we fuse phonotactic features of the acoustic diversified phone recognizers using a simple linear fusion method to build the PPRVSM system. A novel logistic regression optimized weighting (LROW) approach is introduced for fusion factor optimization. The experimental results show that fusion at the feature level is more effective than at the score level. And the proposed system is competitive with the traditional PPRVSM. Finally, the two systems are combined for further improvement. The best performing system reported in this paper achieves an equal error rate (EER) of 1.24%, 4.98% and 14.96% on the NIST 2007 LRE 30-second, 10-second and 3-second evaluation databases, respectively, for the closed-set test condition.

Autosegmental Neural Nets 2.0: An Extensive Study of Training Synchronous and Asynchronous Phones and Tones for Under-Resourced Tonal Languages

Linguistic Feedback Supports Rapid Adaptation to Acoustically Degraded Speech

End-to-end Code-switched TTS with Mix of Monolingual Recordings.

Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition

Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection

Label Transform Based Cross-Language Speaker Adaptation in Bilingual (Mandarin-English) TTS

An Improved Cross-Language Model Adaptation Method for Speech Synthesis

Tri-stage training with language-specific encoder and bilingual acoustic learner for code-switching speech recognition

Learning Phone Recognition from Unpaired Audio and Phone Sequences Based on Generative Adversarial Network

Mix and Match: An Empirical Study on Training Corpus Composition for Polyglot Text-To-Speech (TTS)

Improving Audio-visual Speech Recognition Performance with Cross-modal Student-teacher Training

Language Recognition Based on Acoustic Diversified Phone Recognizers and Phonotactic Feature Fusion

Can phones, syllables, and words emerge as side-products of cross-situational audiovisual learning? -- A computational investigation

Acoustic Modeling With Dfsmn-Ctc And Joint Ctc-Ce Learning

Enhancing CTC-based speech recognition with diverse modeling units

Performance Improvements of Probabilistic Transcript-adapted ASR with Recurrent Neural Network and Language-specific Constraints

Almost Unsupervised Text to Speech and Automatic Speech Recognition

Enhancing Cross-lingual Transfer via Phonemic Transcription Integration

Speech Selection and Environmental Adaptation for Asynchronous Speech Recognition

Synchronising audio and ultrasound by learning cross-modal embeddings

The Study of Perceptual Training of Chinese Mandarin Tones for Monolingual Speakers of English Using Adaptive Computer Based Training Software