Query spelling correction for multi-language search engines

Bo Zhou,Min Zhang,Shaoping Ma,Yiqun Liu,Liyun Ru
2009-01-01
Journal of Computational Information Systems
Abstract:Query spelling correction is crucial to search engines for improving web search relevance and user experience. Additional difficulty is posed beyond traditional lexicon-based spelling correction, since there exist vast amount of out-of-lexicon but legitimate terms in queries and context sensitive techniques cannot work on queries with less than 3 terms in average. Most recent studies concentrate on spelling correction of English queries. Nevertheless, in multi-language search engines, queries often contain terms in different languages. Inspired by the fact that users with various kinds of language backgrounds (especially those with non-latin language backgrounds) all resort to different but romanized intermediate systems to type their language into computer, in this paper we present a general spelling correction approach for multi-language queries. This approach is based on widely used Source Channel Model and knowledge extracted from search log. Groups of queries in both English and Chinese are chosen for evaluation, because they are the most dominant languages in the Web and the method of Chinese romanization is phonetic transcription which leads to its romanization system more complicated than other languages. The experiments performed on 57280 randomly sampled real search queries show that 91.67% precision and 81.31% recall have been achieved. The key contribution of this paper is the thought that by spelling correction of romanization systems we can solve spelling correction problems for different languages. 1553-9105/ Copyright © 2009 Binary Information Press.
What problem does this paper attempt to address?