CL4DIV: A Contrastive Learning Framework for Search Result Diversification.

Zhirui Deng,Zhicheng Dou,Yutao Zhu,Ji-Rong Wen
DOI: https://doi.org/10.1145/3616855.3635851
2024-01-01
Abstract:Search result diversification aims to provide a diversified document ranking list so as to cover as many intents as possible and satisfy the various information needs of different users. Existing approaches usually represented documents by pretrained embeddings (such as doc2vec and Glove). These document representations cannot adequately represent the document's content and are hard to capture the intrinsic user's intent coverage of the given query. Moreover, the limited number of labeled data for search result diversification exacerbates the difficulty of obtaining more efficient document representations. To alleviate these problems and learn more effective document representations, we propose a Contrastive Learning framework for search result DIVersification (CL4DIV). Specifically, we design three contrastive learning tasks from the perspective of subtopics, documents, and candidate document sequences, which correspond to three essential elements in search result diversification. These training tasks are employed to pretrain the document encoder and the document sequence encoder, which are used in the diversified ranking model. Experimental results show that øurs significantly outperforms all existing diversification models. Further analysis demonstrates that our method has wide applicability and can also be used to improve several existing methods.
What problem does this paper attempt to address?