Abstract:We introduce a new unsupervised text embedding method, Meta-Task Prompting with Explicit One-Word Limitation (MetaEOL), for generating high-quality sentence embeddings from Large Language Models (LLMs) without the need for model fine-tuning. Leveraging meta-task prompting, MetaEOL guides LLMs to produce embeddings through a series of carefully designed prompts that address multiple representational aspects. Our comprehensive experiments demonstrate that embeddings averaged from various meta-tasks are versatile embeddings that yield competitive performance on Semantic Textual Similarity (STS) benchmarks and excel in downstream tasks, surpassing contrastive-trained models. Our findings suggest a new scaling law, offering a versatile and resource-efficient approach for embedding generation across diverse scenarios.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to generate high - quality sentence embedding representations from large - language models (LLMs) without fine - tuning the models. Specifically, the paper proposes a new unsupervised text embedding method - Meta - Task Prompting with Explicit One - Word Limitation (MetaEOL). This method uses a series of carefully designed prompts to guide LLMs to generate embeddings that can capture multiple aspects of sentence representation. The researchers hope that through this method, the generated embeddings will perform well in the Semantic Text Similarity (STS) benchmark test, outperform contrastively - trained models in downstream tasks, and at the same time provide a general and resource - efficient method to generate embeddings suitable for different scenarios. ### Background of the Paper In recent years, with the emergence of large - scale language models such as GPT - 3 and LLaMA, the field of natural language processing (NLP) has made significant progress. These models provide promising unsupervised methods for various NLP tasks by using task - related instructions or prompts. Among them, generating sentence embeddings is an important application direction, aiming to generate sentence representations that can be applied to a wide range of scenarios. However, existing methods usually require a large amount of fine - tuning, which is very resource - consuming. ### Core of the Problem 1. **Limitations of Existing Methods**: - **Limitations of Single Prompts**: Existing prompt - based methods usually rely on a single prompt, which may lead to the generated embeddings being too simple or unable to accurately capture the semantic nuances of sentences. - **Poor Generalization Ability of Task - Specific Embeddings**: Task - specific embeddings perform well on specific tasks, but have poor generalization ability on other tasks. 2. **Innovations of MetaEOL**: - **Multi - Task Prompting**: MetaEOL defines a set of meta - tasks, each meta - task targeting different application scenarios, guiding LLMs to consider sentence representation from multiple perspectives. - **Comprehensive Embedding**: By averaging the embeddings from different meta - tasks, a comprehensive sentence embedding is generated, thereby increasing the diversity and richness of the embeddings. ### Experimental Verification The paper verifies the effectiveness of MetaEOL through extensive experiments. The experimental results show that: 1. **Competitiveness without Training**: The embeddings generated by MetaEOL perform well in STS tasks and even outperform contrastively - trained models in some downstream tasks. 2. **Complementarity of Meta - Tasks**: As the number of meta - tasks increases, the performance continues to improve, verifying the complementarity between meta - tasks. 3. **Impact of Layer Selection**: The last layer is not always the most effective, and the performance can be further improved through a simple proportional layer selection strategy. ### Conclusion MetaEOL provides a new unsupervised method that can generate high - quality sentence embeddings from large - language models without additional fine - tuning. This method is not only competitive in performance but also excellent in resource efficiency and is suitable for multiple application scenarios.

Meta-Task Prompting Elicits Embeddings from Large Language Models

Towards Unified Task Embeddings Across Multiple Models: Bridging the Gap for Prompt-Based Large Language Models and Beyond

Enhancing Embedding Performance through Large Language Model-based Text Enrichment and Rewriting

GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings

Improving Text Embeddings with Large Language Models

Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free

Simple Techniques for Enhancing Sentence Embeddings in Generative Language Models

Unveiling the Lexical Sensitivity of LLMs: Combinatorial Optimization for Prompt Enhancement

E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning

EmbedLLM: Learning Compact Representations of Large Language Models

Embedding-Aligned Language Models

Making Text Embedders Few-Shot Learners

Retrieve Anything To Augment Large Language Models

The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics

Domain-specific meta-embedding with latent semantic structures

Large Language Models are Good Multi-lingual Learners : When LLMs Meet Cross-lingual Prompts

Large Language Models Prompting With Episodic Memory

Meta Semantic Template for Evaluation of Large Language Models

Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding

EPA: Easy Prompt Augmentation on Large Language Models via Multiple Sources and Multiple Targets