Improved Semantic Retrieval of Spoken Content by Language Models Enhanced with Acoustic Similarity Graph

Hung-yi Lee,Tsung-Hsien Wen,Lin-Shan Lee
DOI: https://doi.org/10.1109/slt.2012.6424219
2012-01-01
Abstract:Retrieving objects semantically related to the query has been widely studied in text information retrieval. However, when applying the text-based techniques on spoken content, the inevitable recognition errors may seriously degrade the performance. In this paper, we propose to enhance the expected term frequencies estimated from spoken content by acoustic similarity graphs. For each word in the lexicon, a graph is constructed describing acoustic similarity among spoken segments in the archive. Score propagation over the graph helps in estimating the expected term frequencies. The enhanced expected term frequencies can be used in the language modeling retrieval approach, as well as semantic retrieval techniques such as the document expansion based on latent semantic analysis, and query expansion considering both words and latent topic information. Preliminary experiments performed on Mandarin broadcast news indicated that improved performance were achievable under different conditions.
What problem does this paper attempt to address?