Understanding the Multi-vector Dense Retrieval Models

Qi Liu,Jiaxin Mao
DOI: https://doi.org/10.1145/3583780.3615282
2023-01-01
Abstract:While dense retrieval has become a promising alternative to the traditional text retrieval models, such as BM25, some recent studies show that multi-vector dense retrieval models are more effective than the single-vector method in retrieval tasks. However, due to a lack of interpretability, why the multi-vector method outperforms its single-vector counterpart has not been fully studied. To fill this research gap, in this work, we investigate and compare the behaviors of single-vector and multi-vector models in retrieval. Specifically, we analyze the vocabulary distribution of dense representations by mapping them back to the sparse, vocabulary space. Our empirical findings show that the multi-vector representation has more lexical overlaps between queries and passages. Additionally, we show that this feature of multi-vector representation can enhance its ranking performance when a given passage can fulfill different information needs and thus can be retrieved by different queries. These results shed light on the internal mechanisms of multi-vector representation and may provide new perspectives for future research.
What problem does this paper attempt to address?