Abstract:One typical phonotactic system for language recognition is parallel phone recognition followed by vector space modeling (PPRVSM). In this system, various phone recognizers are applied in parallel and fused at the score level. Each phone recognizer is trained for a known language, which is assumed to extract complementary information for effective fusion. But this method is limited by the large amount of training samples for which word or phone level transcription is required. Also, score fusion is not the optimal method as fusion at the feature or model level will retain more information than at the score level. This paper presents a new strategy to build and fuse parallel phone recognizers (PPR). This is achieved by training multiple acoustic diversified phone recognizers and fusing at the feature level. The phone recognizers are trained on the same speech data but using different acoustic features and model training techniques. For the acoustic features, Mel-frequency cepstral coefficients (MFCC) and perceptual linear prediction (PLP) are both employed. In addition, a new time-frequency cepstrum (TFC) feature is proposed to extract complementary acoustic information. For the model training, we examine the use of the maximum likelihood and feature minimum phone error methods to train complementary acoustic models. In this study, we fuse phonotactic features of the acoustic diversified phone recognizers using a simple linear fusion method to build the PPRVSM system. A novel logistic regression optimized weighting (LROW) approach is introduced for fusion factor optimization. The experimental results show that fusion at the feature level is more effective than at the score level. And the proposed system is competitive with the traditional PPRVSM. Finally, the two systems are combined for further improvement. The best performing system reported in this paper achieves an equal error rate (EER) of 1.24%, 4.98% and 14.96% on the NIST 2007 LRE 30-second, 10-second and 3-second evaluation databases, respectively, for the closed-set test condition.

Using One-Class Classification Techniques in the Anti-phoneme Problem

Segment Boundary Detection via Class Entropy Measurements in Connectionist Phoneme Recognition

The OCON model: an old but gold solution for distributable supervised classification

Impacts of multicollinearity on CAPT modalities: An heterogeneous machine learning framework for computer-assisted French phoneme pronunciation training

Automatic Call Routing with Multiple Language Models

A Hybrid Approach for Speech Enhancement Using MoG Model and Neural Network Phoneme Classifier

One-class Learning Towards Synthetic Voice Spoofing Detection

Phonotactic language recognition based on DNN-HMM acoustic model

A hierarchical automatic phoneme recognition model for Hindi‐Devanagari consonants using machine learning technique

A K-phoneme-class Based Multi-Model Method for Short Utterance Speaker Recognition

Careful Whisper -- leveraging advances in automatic speech recognition for robust and interpretable aphasia subtype classification

Using Phoneme Representations to Build Predictive Models Robust to ASR Errors

Multitask Learning for Grapheme-to-Phoneme Conversion of Anglicisms in German Speech Recognition

Targeting the Microphthalmia Basic Helix-Loop-Helix–Leucine Zipper Transcription Factor to a Subset of E-Box Elements In Vitro and In Vivo

The OCON model: an old but green solution for distributable supervised classification for acoustic monitoring in smart cities

Multi-class Detection of Pathological Speech with Latent Features: How does it perform on unseen data?

Completely Unsupervised Phoneme Recognition By A Generative Adversarial Network Harmonized With Iteratively Refined Hidden Markov Models

Speech Recognition Algorithm Based on Neural Network and Hidden Markov Model

Modelling human speech recognition in challenging noise maskers using machine learning

Language Recognition Based on Acoustic Diversified Phone Recognizers and Phonotactic Feature Fusion

Learning not to Discriminate: Task Agnostic Learning for Improving Monolingual and Code-switched Speech Recognition