Abstract:Dense retrieval conducts text retrieval in the embedding space and has shown many advantages compared to sparse retrieval. Existing dense retrievers optimize representations of queries and documents with contrastive training and map them to the embedding space. The embedding space is optimized by aligning the matched query-document pairs and pushing the negative documents away from the query. However, in such training paradigm, the queries are only optimized to align to the documents and are coarsely positioned, leading to an anisotropic query embedding space. In this paper, we analyze the embedding space distributions and propose an effective training paradigm, Contrastive Dual Learning for Approximate Nearest Neighbor (DANCE) to learn fine-grained query representations for dense retrieval. DANCE incorporates an additional dual training object of query retrieval, inspired by the classic information retrieval training axiom, query likelihood. With contrastive learning, the dual training object of DANCE learns more tailored representations for queries and documents to keep the embedding space smooth and uniform, thriving on the ranking performance of DANCE on the MS MARCO document retrieval task. Different from ANCE that only optimized with the document retrieval task, DANCE concentrates the query embeddings closer to document representations while making the document distribution more discriminative. Such concentrated query embedding distribution assigns more uniform negative sampling probabilities to queries and helps to sufficiently optimize query representations in the query retrieval task. Our codes are released at <a class="link-external link-https" href="https://github.com/thunlp/DANCE" rel="external noopener nofollow">this https URL</a>.

Unsupervised Dense Information Retrieval with Contrastive Learning

Unsupervised Dense Retrieval with Relevance-Aware Contrastive Pre-Training

Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval

More Robust Dense Retrieval with Contrastive Dual Learning

Unsupervised Multilingual Dense Retrieval via Generative Pseudo Labeling

CodeRetriever: Unimodal and Bimodal Contrastive Learning for Code Search

Unsupervised Domain Adaption for Neural Information Retrieval

Unsupervised Text Representation Learning via Instruction-Tuning for Zero-Shot Dense Retrieval

Unsupervised Large Language Model Alignment for Information Retrieval Via Contrastive Feedback

Category-Level Contrastive Learning for Unsupervised Hashing in Cross-Modal Retrieval

Learning To Retrieve: How to Train a Dense Retrieval Model Effectively and Efficiently

Cross-modal Contrastive Learning for Generalizable and Efficient Image-text Retrieval

Leveraging LLMs for Unsupervised Dense Retriever Ranking

CodeRetriever: A Large Scale Contrastive Pre-Training Method for Code Search

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Transfer Learning Approaches for Building Cross-Language Dense Retrieval Models

Learning Cross-Lingual IR from an English Retriever

Generalized Contrastive Learning for Multi-Modal Retrieval and Ranking

Cross-Lingual Training with Dense Retrieval for Document Retrieval

Self-Supervised Contrastive Learning for Robust Audio-Sheet Music Retrieval Systems

SynC: A Dense Retrieval Method based on Syntactical Contrastive Learning.