Abstract:<p>Keyword search (KWS) means searching for keywords given by the user from continuous speech. Conventional KWS systems are based on Automatic Speech Recognition (ASR), where the input speech has to be first processed by the ASR system before keyword searching. In the recent decade, as deep learning and deep neural networks (DNN) become increasingly popular, KWS systems can also be trained in an end-to-end (E2E) manner. The main advantage of E2E KWS is that there is no need for speech recognition, which makes the training and searching procedure much more straightforward than the traditional ones. This article proposes an E2E KWS model, which consists of four parts: speech encoder-decoder, query encoder-decoder, attention mechanism, and energy scorer. Firstly, the proposed model outperforms the baseline model. Secondly, we find that under various supervision, character or phoneme sequences, speech or query encoders can extract the corresponding information, resulting in different performances. Moreover, we introduce an attention mechanism and invent a novel energy scorer, where the former can help locate keywords. The latter can make final decisions by considering speech embeddings, query embeddings, and attention weights in parallel. We evaluate our model on low resource conditions with about 10-hour training data for four different languages. The experiment results prove that the proposed model can work well on low resource conditions.</p>

Web-based keyword adapted Language Modeling for Keyword Spotting

MM-KWS: Multi-modal Prompts for Multilingual User-defined Keyword Spotting

End-to-end keyword search system based on attention mechanism and energy scorer for low resource languages

Utilizing TTS Synthesized Data for Efficient Development of Keyword Spotting Model

U2-KWS: Unified Two-pass Open-vocabulary Keyword Spotting with Keyword Bias

Disentangled Training with Adversarial Examples For Robust Small-footprint Keyword Spotting

Multilingual Query-by-Example Keyword Spotting with Metric Learning and Phoneme-to-Embedding Mapping

Keyword-Specific Acoustic Model Pruning for Open-Vocabulary Keyword Spotting

Query-by-Example Keyword Spotting Using Spectral-Temporal Graph Attentive Pooling and Multi-Task Learning

Conditional Online Learning for Keyword Spotting

Keyword-specific normalization based keyword spotting for spontaneous speech

Exploring Representation Learning for Small-Footprint Keyword Spotting

Keyword Spotting for Hearing Assistive Devices Robust to External Speakers

Exploiting Noisy Web Data by OOV Ranking for Low-Resource Keyword Search.

Keyword Spotting Based on Hypothesis Boundary Realignment and State-Level Confidence Weighting

TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer

Synth4Kws: Synthesized Speech for User Defined Keyword Spotting in Low Resource Environments

CaTT-KWS: A Multi-stage Customized Keyword Spotting Framework Based on Cascaded Transducer-Transformer.

GE2E-KWS: Generalized End-to-End Training and Evaluation for Zero-shot Keyword Spotting

Audio-visual Keyword Spotting for Mandarin Based on Discriminative Local Spatial-Temporal Descriptors.

Avoid Overfitting User Specific Information in Federated Keyword Spotting