An Inverse Retrieval Method Via Query Generation for Xiaohongshu’s Search Engine

Yuantao Fan,Xinyu Tu,Ruifan Li
DOI: https://doi.org/10.1007/978-981-97-5675-9_31
2024-01-01
Abstract:In the real-world of Information Retrieval, the timely retrieval of the latest documents has gained significant attention in recent years. In this paper, we develop an effective retrieval method for search engines, i.e., inverse retrieval. We propose a two-stage contrastive strategy to train doc2query model, the component of inverse retrieval. We perform offline or nearline computations to generate queries and then build or update an index from the query to the tuple of document and score. We have implemented an offline and a nearline retrieval channel at Xiaohongshu. Both channels showed substantial improvement during A/B tests. To make our work reproducible, we release QD100K dataset with 111K documents and 23M query-doc pairs. Our experimental results on QK100K and MS MARCO show the effectiveness of our method. All our code and datasets are available at https://github.com/fytxlj/InverseRetrievalDataset.
What problem does this paper attempt to address?