Abstract:Spoken term detection (STD) is the task of searching for occurrences of spoken terms in audio archives. It relies on robust confidence estimation to make a hit/false alarm (FA) decision. In order to optimize the decision in terms of the STD evaluation metric, the confidence has to be discriminative. Multi-layer perceptrons (MLPs) and support vector machines (SVMs) exhibit good performance in producing discriminative confidence; however they are severely limited by the continuous objective functions, and are therefore less capable of dealing with complex decision tasks. This leads to a substantial performance reduction when measuring detection of out-of-vocabulary (OOV) terms, where the high diversity in term properties usually leads to a complicated decision boundary. In this paper we present a new discriminative confidence estimation approach based on evolutionary discriminant analysis (EDA). Unlike MLPs and SVMs, EDA uses the classification error as its objective function, resulting in a model optimized towards the evaluation metric. In addition, EDA combines heterogeneous projection functions and classification strategies in decision making, leading to a highly flexible classifier that is capable of dealing with complex decision tasks. Finally, the evolutionary strategy of EDA reduces the risk of local minima. We tested the EDA-based confidence with a state-of-the-art phoneme-based STD system on an English meeting domain corpus, which employs a phoneme speech recognition system to produce lattices within which the phoneme sequences corresponding to the enquiry terms are searched. The test corpora comprise 11 h of speech data recorded with individual head-mounted microphones from 30 meetings carried out at several institutes including ICSI; NIST; ISL; LDC; the Virginia Polytechnic Institute and State University; and the University of Edinburgh. The experimental results demonstrate that EDA considerably outperforms MLPs and SVMs on both classification and confidence measurement in STD, and the advantage is found to be more significant on OOV terms than on in-vocabulary (INV) terms. In terms of classification performance, EDA achieved an equal error rate (EER) of 11% on OOV terms, compared to 34% and 31% with MLPs and SVMs respectively; for INV terms, an EER of 15% was obtained with EDA compared to 17% obtained with MLPs and SVMs. In terms of STD performance for OOV terms, EDA presented a significant relative improvement of 1.4% and 2.5% in terms of average term-weighted value (ATWV) over MLPs and SVMs respectively.

Feature Analysis for Discriminative Confidence Estimation in Spoken Term Detection

Augmented Set Of Features For Confidence Estimation In Spoken Term Detection

Evolutionary Discriminative Confidence Estimation for Spoken Term Detection

Posterior-based confidence measures for spoken term detection

Term-Dependent Confidence Normalisation for Out-of-Vocabulary Spoken Term Detection

Combining Chinese Spoken Term Detection Systems Via Side-Information Conditioned Linear Logistic Regression

A Posterior Probability-Based System Hybridisation and Combination for Spoken Term Detection

Handling Overlaps in Spoken Term Detection

BEST-STD: Bidirectional Mamba-Enhanced Speech Tokenization for Spoken Term Detection

Efficient Probabilistic Latent Semantic Analysis with Sparsity Control

Time–Frequency Cepstral Features and Heteroscedastic Linear Discriminant Analysis for Language Recognition

Multi-feature Combination for Speaker Recognition

Term-Dependent Confidence For Out-Of-Vocabulary Term Detection

Enhanced Spoken Term Detection Using Support Vector Machines and Weighted Pseudo Examples

A Novel Discriminant Locality Preserving Projections for MDM-based Speaker Classification

Integrating Lattice-Free MMI into End-to-End Speech Recognition

Exploring linguistic feature and model combination for speech recognition based automatic AD detection

Structured Discriminative Models Using Deep Neural-Network Features.

LEXICAL ACCESS-BASED CONFIDENCE MEASURE FOR A SPANISH KEYWORD SPOTTING SYSTEM

Discriminative method for recurrent neural network language models

Joint Probabilistic Linear Discriminant Analysis