LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Parishad BehnamGhader,Vaibhav Adlakha,Marius Mosbach,Dzmitry Bahdanau,Nicolas Chapados,Siva Reddy
2024-08-22
Abstract:Large decoder-only language models (LLMs) are the state-of-the-art models on most of today's NLP tasks and benchmarks. Yet, the community is only slowly adopting these models for text embedding tasks, which require rich contextualized representations. In this work, we introduce LLM2Vec, a simple unsupervised approach that can transform any decoder-only LLM into a strong text encoder. LLM2Vec consists of three simple steps: 1) enabling bidirectional attention, 2) masked next token prediction, and 3) unsupervised contrastive learning. We demonstrate the effectiveness of LLM2Vec by applying it to 4 popular LLMs ranging from 1.3B to 8B parameters and evaluate the transformed models on English word- and sequence-level tasks. We outperform encoder-only models by a large margin on word-level tasks and reach a new unsupervised state-of-the-art performance on the Massive Text Embeddings Benchmark (MTEB). Moreover, when combining LLM2Vec with supervised contrastive learning, we achieve state-of-the-art performance on MTEB among models that train only on publicly available data (as of May 24, 2024). Our strong empirical results and extensive analysis demonstrate that LLMs can be effectively transformed into universal text encoders in a parameter-efficient manner without the need for expensive adaptation or synthetic GPT-4 generated data.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address the issue of how to transform decoder-only large language models (LLMs) into powerful text encoders. Specifically, the paper proposes a method called LLM2Vec, which is a simple unsupervised approach capable of converting any decoder-only LLM into an efficient text encoder. LLM2Vec consists of three steps: enabling bidirectional attention, masked next token prediction (MNTP), and unsupervised contrastive learning. Through these three steps, LLM2Vec can efficiently enhance the model's text representation capabilities without using labeled data and achieve significant results in large-scale text embedding benchmarks (MTEB). Particularly, when combined with supervised contrastive learning, it further improves the model's performance. Additionally, the study found that certain models, such as Mistral-7B, can perform well even without fine-tuning after enabling bidirectional attention, a phenomenon that warrants further exploration. Overall, the research demonstrates how to effectively transform decoder-only LLMs into general-purpose text encoders and surpass existing encoder models in various tasks.