Chinese New Word Detection from Query Logs.

Yan Zhang,Maosong Sun,Yang Zhang
DOI: https://doi.org/10.1007/978-3-642-17313-4_24
2010-01-01
Abstract:Existing works in literature mostly resort to the web pages or other author-centric resources to detect new words, which require highly complex text processing. This paper exploits the visitor-centric resources, specifically, query logs from the commercial search engine, to detect new words. Since query logs are generated by the search engine users, and are segmented naturally, the complex text processing work can be avoided. By dynamic time warping, a new word detection algorithm based on the trajectory similarity is proposed to distinguish new words from the query logs. Experiments based on real world data sets show the effectiveness and efficiency of the proposed algorithm.
What problem does this paper attempt to address?