Abstract:Currently, acoustic spoken language recognition (SLR) and phonotactic SLR systems are widely used language recognition systems. To achieve better performance, researchers combine multiple subsystems with the results often much better than a single SLR system. Phonotactic SLR subsystems may vary in the acoustic features vectors or include multiple language-specific phone recognizers and different acoustic models. These methods achieve good performance but usually compute at high computational cost. In this paper, a new diversification for phonotactic language recognition systems is proposed using vector space models by support vector machine (SVM) supervector reconstruction (SSR). In this architecture, the subsystems share the same feature extraction, decoding, and N-gram counting preprocessing steps, but model in a different vector space by using the SSR algorithm without significant additional computation. We term this a homogeneous ensemble phonotactic language recognition (HEPLR) system. The system integrates three different SVM supervector reconstruction algorithms, including relative SVM supervector reconstruction, functional SVM supervector reconstruction, and perturbing SVM supervector reconstruction. All of the algorithms are incorporated using a linear discriminant analysis-maximum mutual information (LDA-MMI) backend for improving language recognition evaluation (LRE) accuracy. Evaluated on the National Institute of Standards and Technology (NIST) LRE 2009 task, the proposed HEPLR system achieves better performance than a baseline phone recognition-vector space modeling (PR-VSM) system with minimal extra computational cost. The performance of the HEPLR system yields 1.39%, 3.63%, and 14.79% equal error rate (EER), representing 6.06%, 10.15%, and 10.53% relative improvements over the baseline system, respectively, for the 30-, 10-, and 3-s test conditions.

Improved spoken term detection using support vector machines with acoustic and context features from pseudo-relevance feedback

Improved Spoken Term Detection by Feature Space Pseudo-Relevance Feedback.

Improved spoken term detection using support vector machines based on lattice context consistency

Improved Spoken Term Detection by Discriminative Training of Acoustic Models Based on User Relevance Feedback.

Enhanced Spoken Term Detection Using Support Vector Machines and Weighted Pseudo Examples

A Framework Integrating Different Relevance Feedback Scenarios and Approaches for Spoken Term Detection.

Improved Spoken Term Detection with Graph-Based Re-Ranking in Feature Space

Integrating Recognition and Retrieval with User Feedback: A New Framework for Spoken Term Detection.

AudioVSR: Enhancing Video Speech Recognition with Audio Data

Improved open-vocabulary spoken content retrieval with word and subword lattices using acoustic feature similarity

Semantic Query Expansion and Context-Based Discriminative Term Modeling for Spoken Document Retrieval

Improved Semantic Retrieval of Spoken Content by Document/Query Expansion with Random Walk Over Acoustic Similarity Graphs

Improved Audio Embeddings by Adjacency-Based Clustering with Applications in Spoken Term Detection

Discriminative Boosting Algorithm for Diversified Front-End Phonotactic Language Recognition

Improved Semantic Retrieval of Spoken Content by Language Models Enhanced with Acoustic Similarity Graph

Spoken English Intelligibility Remediation with PocketSphinx Alignment and Feature Extraction Improves Substantially over the State of the Art

Automatic detection of contrastive word pairs using textual and acoustic features

A Study of Discriminatory Speech Classification Based on Improved Smote and SVM-RF

Homogenous Ensemble Phonotactic Language Recognition Based on SVM Supervector Reconstruction

Learning Virtual HD Model for Bi-model Emotional Speaker Recognition

Open-Vocabulary Retrieval of Spoken Content with Shorter/Longer Queries Considering Word/Subword-based Acoustic Feature Similarity.