Sogou-ST: A New Dataset with Large-scale Refined Real-world Web Search Sessions

Jia Chen,Jiaxin Mao,Yiqun Liu,Min Zhang,Shaoping Ma
2019-01-01
Abstract:Web search session data is precious for a wide range of Information Retrieval (IR) tasks, such as session search, query suggestion, click through rate (CTR) prediction and so on. Numerous studies have shown the great potential of considering context information for search system optimization. The well-known TREC Session Tracks have enhanced the development in this domain to a great extent. However, they are mainly collected via user studies or crowdsourcing experiments and normally contain only tens to thousands sessions, which are de cient for the investigation with more sophisticated models. To tackle this obstacle, we present a new dataset that contains 147,155 re ned web search sessions with both click-based and human-annotated relevance labels. The sessions are sampled from a huge search log thus can re ect real search scenarios. The proposed dataset can support a wide range of session-level or taskbased IR studies. As an example, we test several interactive search models with both the PSCM and human relevance labels provided by this dataset and report the performance as a reference for future studies of session search.
What problem does this paper attempt to address?