SynC: A Dense Retrieval Method based on Syntactical Contrastive Learning.

Hongjin Tao,Jun Zeng,Yang Yu,Ziwei Wang,Xiaolin Hu
DOI: https://doi.org/10.1109/IJCNN54540.2023.10191855
2023-01-01
Abstract:Recently, dense retrieval method has significantly outperformed sparse retrieval technology. It becomes a mainstream approach of relevant passage retrieving task. Dense retrieval tasks encode query and passage into dense representation space and apply contrastive learning to obtain more representative vectors, then retrieve the most similar passage by the inner product of the dense vectors. However, previous achievements rely on extremely large batch size, epoch and large-scaled pre-trained models, which results in tremendous computational resource consumption. Also, the expensive equipment prerequisite and low training efficiency constrain the development of dense retrieval community. Therefore, we present an alternative solution to improve training efficiency and quality by syntactical contrastive learning methods with specially designed masking strategy. To alleviate the computational consumption problem, this paper proposes query-based and passage-based masking strategies to obtain syntactical-isolated representations. Besides, instead of only considering query-to-passage similarity while conducting contrastive learning, we additionally consider query-to-query and passage-to-passage similarity when training the dual-encoder retriever. The experiments show that the proposed approach achieved competitive results in small batch size and epoch comparing to previous state-of-the-art dense retrieval methods, and also to strong baseline.
What problem does this paper attempt to address?