Abstract:<p>Keyword search (KWS) means searching for keywords given by the user from continuous speech. Conventional KWS systems are based on Automatic Speech Recognition (ASR), where the input speech has to be first processed by the ASR system before keyword searching. In the recent decade, as deep learning and deep neural networks (DNN) become increasingly popular, KWS systems can also be trained in an end-to-end (E2E) manner. The main advantage of E2E KWS is that there is no need for speech recognition, which makes the training and searching procedure much more straightforward than the traditional ones. This article proposes an E2E KWS model, which consists of four parts: speech encoder-decoder, query encoder-decoder, attention mechanism, and energy scorer. Firstly, the proposed model outperforms the baseline model. Secondly, we find that under various supervision, character or phoneme sequences, speech or query encoders can extract the corresponding information, resulting in different performances. Moreover, we introduce an attention mechanism and invent a novel energy scorer, where the former can help locate keywords. The latter can make final decisions by considering speech embeddings, query embeddings, and attention weights in parallel. We evaluate our model on low resource conditions with about 10-hour training data for four different languages. The experiment results prove that the proposed model can work well on low resource conditions.</p>

A Novel Discriminative Score Calibration Method for Keyword Search

Calibration of Word Posterior Estimation in Confusion Networks for Keyword Search

Experimental Investigation into Alignment-based Acoustic Confidence Measures in Keyword Verification for Mandarin Speech

Improved Keyword Spotting System by Optimizing Posterior Confidence Measure Vector Using Feed-Forward Neural Network.

Improved System Fusion for Keyword Search

A Rescoring Approach for Keyword Search Using Lattice Context Information.

Improving keyword search by query expansion in a probabilistic framework

Integrate Document Ranking Information into Confidence Measure Calculation for Spoken Term Detection.

Keyword Spotting Based on Hypothesis Boundary Realignment and State-Level Confidence Weighting

Word Spotting Based on a Posterior Measure of Keyword Confidence

A Novel Re-weighted CTC Loss for Data Imbalance in Speech Keyword Spotting

An lstm-ctc based verification system for proxy-word based oov keyword search

Keyword-specific normalization based keyword spotting for spontaneous speech

Keyword Spotting Based on Syllable Confusion Network.

A Two-Step Keyword Spotting Method Based on Context-Dependent a Posteriori Probability

Audio-visual Keyword Spotting for Mandarin Based on Discriminative Local Spatial-Temporal Descriptors.

LEXICAL ACCESS-BASED CONFIDENCE MEASURE FOR A SPANISH KEYWORD SPOTTING SYSTEM

End-to-end keyword search system based on attention mechanism and energy scorer for low resource languages

Bayesian Estimation of Keyword Confidence in Chinese Continuous Speech Recognition

A NOVEL TWO-LEVEL ARCHITECTURE PLUS CONFIDENCE MEASURES FOR A KEYWORD SPOTTING SYSTEM

Multicalibration for Confidence Scoring in LLMs