Abstract:Currently, acoustic spoken language recognition (SLR) and phonotactic SLR systems are widely used language recognition systems. To achieve better performance, researchers combine multiple subsystems with the results often much better than a single SLR system. Phonotactic SLR subsystems may vary in the acoustic features vectors or include multiple language-specific phone recognizers and different acoustic models. These methods achieve good performance but usually compute at high computational cost. In this paper, a new diversification for phonotactic language recognition systems is proposed using vector space models by support vector machine (SVM) supervector reconstruction (SSR). In this architecture, the subsystems share the same feature extraction, decoding, and N-gram counting preprocessing steps, but model in a different vector space by using the SSR algorithm without significant additional computation. We term this a homogeneous ensemble phonotactic language recognition (HEPLR) system. The system integrates three different SVM supervector reconstruction algorithms, including relative SVM supervector reconstruction, functional SVM supervector reconstruction, and perturbing SVM supervector reconstruction. All of the algorithms are incorporated using a linear discriminant analysis-maximum mutual information (LDA-MMI) backend for improving language recognition evaluation (LRE) accuracy. Evaluated on the National Institute of Standards and Technology (NIST) LRE 2009 task, the proposed HEPLR system achieves better performance than a baseline phone recognition-vector space modeling (PR-VSM) system with minimal extra computational cost. The performance of the HEPLR system yields 1.39%, 3.63%, and 14.79% equal error rate (EER), representing 6.06%, 10.15%, and 10.53% relative improvements over the baseline system, respectively, for the 30-, 10-, and 3-s test conditions.

Modeling method and modeling device for language identification

Recording Device Identification Based on Cepstral Mixed Features

Design and implementation of a speaker recognition system

Neural network language model training method and device and voice recognition method

High-resolution Acoustic Modeling and Compact Language Modeling of Language-Universal Speech Attributes for Spoken Language Identification.

Automatic Language Identification Using Support Vector Machines and Phonetic N-gram

SILENT VOICE INPUT IDENTIFICATION METHOD, COMPUTING APPARATUS, AND COMPUTER-READABLE MEDIUM

A language model based approach towards large scale and lightweight language identification systems

HMM Modeling Based on Mandarin Phonemes in Embedded Systems

Methods Conclusions Recording Device Identification Based on Ceptral Mixed Feature

Deep neural network construction method for voice command word recognition and recognition method and device

Streaming Language Identification using Combination of Acoustic Representations and ASR Hypotheses

Homogenous Ensemble Phonotactic Language Recognition Based on SVM Supervector Reconstruction

Phonetic Temporal Neural Model for Language Identification

Investigating model performance in language identification: beyond simple error statistics

Estimating and detecting method and system for telephone continuous speech recognition system performance

LVCSR System for English and Mandarin Integrated with Language Identification

Method and device for determining pronunciation of polyphonic characters

Phoneme Modeling Units Design for Mandarin LVCSR Systems

Automatic Call Routing with Multiple Language Models

A New Robust Telephone Speech Recognition Algorithm With The Multi-Model Structures