LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Parishad BehnamGhader,Vaibhav Adlakha,Marius Mosbach,Dzmitry Bahdanau,Nicolas Chapados,Siva Reddy

2024-08-22

Abstract:Large decoder-only language models (LLMs) are the state-of-the-art models on most of today's NLP tasks and benchmarks. Yet, the community is only slowly adopting these models for text embedding tasks, which require rich contextualized representations. In this work, we introduce LLM2Vec, a simple unsupervised approach that can transform any decoder-only LLM into a strong text encoder. LLM2Vec consists of three simple steps: 1) enabling bidirectional attention, 2) masked next token prediction, and 3) unsupervised contrastive learning. We demonstrate the effectiveness of LLM2Vec by applying it to 4 popular LLMs ranging from 1.3B to 8B parameters and evaluate the transformed models on English word- and sequence-level tasks. We outperform encoder-only models by a large margin on word-level tasks and reach a new unsupervised state-of-the-art performance on the Massive Text Embeddings Benchmark (MTEB). Moreover, when combining LLM2Vec with supervised contrastive learning, we achieve state-of-the-art performance on MTEB among models that train only on publicly available data (as of May 24, 2024). Our strong empirical results and extensive analysis demonstrate that LLMs can be effectively transformed into universal text encoders in a parameter-efficient manner without the need for expensive adaptation or synthetic GPT-4 generated data.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

The paper aims to address the issue of how to transform decoder-only large language models (LLMs) into powerful text encoders. Specifically, the paper proposes a method called LLM2Vec, which is a simple unsupervised approach capable of converting any decoder-only LLM into an efficient text encoder. LLM2Vec consists of three steps: enabling bidirectional attention, masked next token prediction (MNTP), and unsupervised contrastive learning. Through these three steps, LLM2Vec can efficiently enhance the model's text representation capabilities without using labeled data and achieve significant results in large-scale text embedding benchmarks (MTEB). Particularly, when combined with supervised contrastive learning, it further improves the model's performance. Additionally, the study found that certain models, such as Mistral-7B, can perform well even without fine-tuning after enabling bidirectional attention, a phenomenon that warrants further exploration. Overall, the research demonstrates how to effectively transform decoder-only LLMs into general-purpose text encoders and surpass existing encoder models in various tasks.

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Llama2Vec: Unsupervised Adaptation of Large Language Models for Dense Retrieval

EmbedLLM: Learning Compact Representations of Large Language Models

VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks

E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning

Improving Text Embeddings with Large Language Models

NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

Investigating Decoder-only Large Language Models for Speech-to-text Translation

Think Big, Generate Quick: LLM-to-SLM for Fast Autoregressive Decoding

Large Language Models Are Strong Audio-Visual Speech Recognition Learners

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

Language Models are Universal Embedders

Large Language Models: A Survey

LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation

LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models

EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model

Word Embeddings Revisited: Do LLMs Offer Something New?

Large Language Models aren't all that you need

SentenceVAE: Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context

Are Bigger Encoders Always Better in Vision Large Models?

Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free