Abstract:Spoken term detection (STD) is a key technology for retrieval of spoken content, which will be very important to retrieve and browse multimedia content over the Internet. The discriminative capability of machine learning methods has recently been used to facilitate STD. This paper presents a new approach to improve STD using support vector machines (SVM) based on acoustic information. The concept of pseudo-relevance feedback (PRF) well used in the retrieval of text, image and video is used here. The basic idea of using PRF here is to assume some spoken segments in the first-pass retrieved results are relevant (or pseudo-relevant) and some others irrelevant (or pseudo-irrelevant), and take these segments as positive and negative examples to train a query-specific SVM. This SVM is then used for re-ranking the first-pass retrieved results, and only the re-ranked results are shown to the user. In this paper, feature vectors representing the spoken segments based on acoustic information to be used in SVM are considered and analyzed. Furthermore, conventionally in PRF the items with the highest and lowest scores in the first-pass retrieved results are respectively taken as pseudo-relevant and -irrelevant, but in this way some incorrect examples are inevitably included in the training data especially when the recognition accuracy is poor. Here we further propose an enhanced SVM which not only better selects positive/negative examples considering the reliability of the spoken segments, but emphasizes more on more reliable training examples by modifying the SVM formulation. Experiments on two different sets of spoken archives with different speaking styles and different levels of recognition accuracies demonstrated significant improvements offered by the proposed approaches.

Improved Spoken Term Detection by Discriminative Training of Acoustic Models Based on User Relevance Feedback.

Improved Spoken Term Detection by Feature Space Pseudo-Relevance Feedback.

A Framework Integrating Different Relevance Feedback Scenarios and Approaches for Spoken Term Detection.

Improved spoken term detection using support vector machines with acoustic and context features from pseudo-relevance feedback

Integrating Recognition and Retrieval with User Feedback: A New Framework for Spoken Term Detection.

Improved Spoken Term Detection with Graph-Based Re-Ranking in Feature Space

Improved spoken term detection using support vector machines based on lattice context consistency

Enhanced Spoken Term Detection Using Support Vector Machines and Weighted Pseudo Examples

Improved Semantic Retrieval of Spoken Content by Language Models Enhanced with Acoustic Similarity Graph

Improved Semantic Retrieval of Spoken Content by Document/Query Expansion with Random Walk Over Acoustic Similarity Graphs

Semantic Query Expansion and Context-Based Discriminative Term Modeling for Spoken Document Retrieval

Improved open-vocabulary spoken content retrieval with word and subword lattices using acoustic feature similarity

Improved Audio Embeddings by Adjacency-Based Clustering with Applications in Spoken Term Detection

Open-Vocabulary Retrieval of Spoken Content with Shorter/Longer Queries Considering Word/Subword-based Acoustic Feature Similarity.

Improving Audio-visual Speech Recognition Performance with Cross-modal Student-teacher Training

Graph-based re-ranking using acoustic feature similarity between search results for spoken term detection on low-resource languages.

Spoken Term Detection Using Dynamic Match Subword Confusion Network

Enhancing Query Expansion for Semantic Retrieval of Spoken Content with Automatically Discovered Acoustic Patterns.

An initial attempt to improve spoken term detection by learning optimal weights for different indexing features

Discriminative Boosting Algorithm for Diversified Front-End Phonotactic Language Recognition

Unsupervised Discovery of Structured Acoustic Tokens with Applications to Spoken Term Detection