Query Expansion Based High Performance Chinese Voice Retrieval

Wei LI,Ji WU,Ping L(U)
DOI: https://doi.org/10.3969/j.issn.1003-6059.2011.04.015
2011-01-01
Pattern Recognition and Artificial Intelligence
Abstract:The aim of Chinese voice retrieval systems is to locate query texts in audio files fast and precisely. In a typical implementation of the system, voice files are recognized and stored in index. The system segments each query into a word sequence and uses the sequence to search. The mismatch between query segmentation and recognition can influence system's performance. To solve this problem, multiple segmentation results and prefix-suffix expansions have been used to broaden the original query. The retrieval process is on the basis of the expansion's outputs. Query expansion generates a lot of outputs, which slows down the retrieval speed. In order to increase the system's efficiency, the Finite State Automata (FSA) is introduced to compress query expansions. And a Token-based search algorithm is used for fast search. Experimental results show that the query expansion leads the system's EER to improve about 50%~70% relatively. The FSA compresses the retrieval space, and raises the retrieval speed nearly 30 times.
What problem does this paper attempt to address?