Abstract:Spoken term detection (STD) is a key technology for retrieval of spoken content, which will be very important to retrieve and browse multimedia content over the Internet. The discriminative capability of machine learning methods has recently been used to facilitate STD. This paper presents a new approach to improve STD using support vector machines (SVM) based on acoustic information. The concept of pseudo-relevance feedback (PRF) well used in the retrieval of text, image and video is used here. The basic idea of using PRF here is to assume some spoken segments in the first-pass retrieved results are relevant (or pseudo-relevant) and some others irrelevant (or pseudo-irrelevant), and take these segments as positive and negative examples to train a query-specific SVM. This SVM is then used for re-ranking the first-pass retrieved results, and only the re-ranked results are shown to the user. In this paper, feature vectors representing the spoken segments based on acoustic information to be used in SVM are considered and analyzed. Furthermore, conventionally in PRF the items with the highest and lowest scores in the first-pass retrieved results are respectively taken as pseudo-relevant and -irrelevant, but in this way some incorrect examples are inevitably included in the training data especially when the recognition accuracy is poor. Here we further propose an enhanced SVM which not only better selects positive/negative examples considering the reliability of the spoken segments, but emphasizes more on more reliable training examples by modifying the SVM formulation. Experiments on two different sets of spoken archives with different speaking styles and different levels of recognition accuracies demonstrated significant improvements offered by the proposed approaches.

Improved Speech Summarization and Spoken Term Detection with Graphical Analysis of Utterance Similarities

Exploring hypergraph-based semi-supervised ranking for query-oriented summarization

Exploring Simultaneous Keyword and Key Sentence Extraction

Exploring simultaneous keyword and key sentence extraction: improve graph-based ranking using wikipedia.

Improved Spoken Term Detection with Graph-Based Re-Ranking in Feature Space

Improved open-vocabulary spoken content retrieval with word and subword lattices using acoustic feature similarity

Improved Semantic Retrieval of Spoken Content by Document/Query Expansion with Random Walk Over Acoustic Similarity Graphs

Improved Semantic Retrieval of Spoken Content by Language Models Enhanced with Acoustic Similarity Graph

Graph-based re-ranking using acoustic feature similarity between search results for spoken term detection on low-resource languages.

Supervised Spoken Document Summarization Jointly Considering Utterance Importance and Redundancy by Structured Support Vector Machine.

Supervised Spoken Document Summarization Based On Structured Support Vector Machine With Utterance Clusters As Hidden Variables

Improved affinity graph based multi-document summarization

Hierarchical Summarization for Longform Spoken Dialog

An Integrated Graph Model For Document Summarization

Jointly Considering Utterance Importance and Redundancy by Structured Support Vector Machine

Towards A Unified Approach Based On Affinity Graph To Various Multi-Document Summarizations

Hierarchical Graph Summarization: Leveraging Hybrid Information through Visible and Invisible Linkage

SgSum: Transforming Multi-document Summarization into Sub-graph Selection

HyperSum: hypergraph based semi-supervised sentence ranking for query-oriented summarization.

Enhanced Spoken Term Detection Using Support Vector Machines and Weighted Pseudo Examples

LexRank: Graph-based Lexical Centrality as Salience in Text Summarization