Learning a Replacement Model for Query Segmentation with Consistency in Search Logs.

Wei Zhang,Yunbo Cao,Chin-Yew Lin,Jian Su,Chew Lim Tan
2013-01-01
Abstract:Query segmentation is to split a query into a sequence of non-overlapping segments that completely cover all tokens in the query. The majority of methods are unsupervised, however, they are usually not as accurate as supervised methods due to the lack of guidance from labeled data. In this paper, we propose a new paradigm of learning a replacement model with consistency(LRMC), to enable unsupervised training with guidance from search log data. In LRMC, we first assume the existence of a base segmenter (an implementation of any existing approach). Then, we utilize a key observation that queries with a similar intent tend to have consistent segmentations, to automatically collect a set of labeled data from the outputs of the base segmenter by leveraging search log data. Finally, we employ the auto-collected data to train a replacement model for selecting the correct segmentation of a new query from the outputs of the base segmenter. The results show LRMC can improve state-of-the-art methods by an F-Score of around 7%.
What problem does this paper attempt to address?