SogouQ: The First Large-Scale Test Collection with Click Streams Used in a Shared-Task Evaluation

Ruihua Song,Min Zhang,Cheng Luo,Tetsuya Sakai,Yiqun Liu,Zhicheng Dou
DOI: https://doi.org/10.1007/978-981-15-5554-1_10
2021-01-01
Abstract:Search logs are very precious for information retrieval studies. In this chapter, we will introduce a real Chinese query log dataset, SogouQ, which was released by SogouQ corporation in 2010 for the NTCIR-9 Intent task. SogouQ contains more than 30 million clicks collected in 2008. It is the first large-scale query logs used in a shared-task evaluation (i.e., the NTCIR tasks). SogouQ has been adopted in a number of follow-up evaluation tasks, NTCIR-10 Intent-2, NTCIR-11 IMine, NTCIR-12 IMine-2, as well as in several Chinese domestic tasks. Moreover, SogouQ has a broader impact on other research areas, such as natural language processing and social science. It has been acquired by more than 200 institutions.
What problem does this paper attempt to address?