Out-of-Vocabulary Issue in Chinese Spoken Term Detection and A Two-Stage Chinese Speech Retrieval Method

孟莎,刘加
DOI: https://doi.org/10.3969/j.issn.1003-0077.2009.06.014
2009-01-01
Abstract:While the Out-of-Vocabulary(OOV) problem remains a challenge for English spoken term detection tasks,it is underestimated for Chinese.This is because a Chinese OOV query term can still be matched as a sequence of Chinese characters,with each character itself being a word in the vocabulary.However,our experiments show that search accuracy levels differ significantly when a query is or is not in the vocabulary.We examine this problem with a word-lattice-based spoken term detection task.We propose a two-stage method by first locating candidates by partial phonetic matching and then refining the matching score with word lattice rescoring.Experiments show that the proposed method achieves a 24.1% relative improvement for OOV queries on a large-scale Chinese spoken term detection task.
What problem does this paper attempt to address?