Query Expansion Based on Term Time Distribution for Microblog Retrieval
Zhong-Yuan HAN,Mu-Yun YANG,Lei-Lei KONG,Hao-Liang QI,Sheng LI
DOI: https://doi.org/10.11897/SP.J.1016.2016.02031
2016-01-01
Chinese Journal of Computers
Abstract:In microblog retrieval,content-based query expansion methods are not adequate for expanding queries since the relevant microblog messages are too short to provide reliable term distribution information.Most of the existing time-based query expansion methods exploit time profile to shift the prior probability of relevant microblogs.In essence,these methods still could not avoid the restrictions of short texts since the relevance between expansion terms and query is still based on the content of microblogs.To address the problem,this paper proposes a query expansion method based on the time distribution of terms,in which the relevance between query terms and expansion terms is measured by their time distribution similarity.First,the changes of term frequency in different time segments are analyzed,the term time distribution is defined and the estimation methods are illustrated.Then a similarity estimation approach of term time distribution is presented to estimate the relevance of query terms and expansion terms,so as to decide the expansion terms in the re-estimated query model.Two query expansion strategies are given to estimate the query expansion model according to the relevance of expansion terms and query.Finally,by integrating the query expansion model and original query model,the term time distribution query model is presented.The effort to use only time profile to establish the relevance between query terms and expansion terms avoids the drawbacks of the classical content-based query expansion approaches due to the length limit in microblog.Experiments were carried on TREC 2011 and TREC 2012 microblog retrieval collection.Several state-of-the-art baselines are chosen for comparing with our method,including the classical language model,the content-based query expansion method and the time-based query expansion method.The experimental results show that the term time distribution query model outperforms the content-based as well as the time-based approaches.