Abstract:High-quality text embedding is pivotal in improving semantic textual similarity (STS) tasks, which are crucial components in Large Language Model (LLM) applications. However, a common challenge existing text embedding models face is the problem of vanishing gradients, primarily due to their reliance on the cosine function in the optimization objective, which has saturation zones. To address this issue, this paper proposes a novel angle-optimized text embedding model called AnglE. The core idea of AnglE is to introduce angle optimization in a complex space. This novel approach effectively mitigates the adverse effects of the saturation zone in the cosine function, which can impede gradient and hinder optimization processes. To set up a comprehensive STS evaluation, we experimented on existing short-text STS datasets and a newly collected long-text STS dataset from GitHub Issues. Furthermore, we examine domain-specific STS scenarios with limited labeled data and explore how AnglE works with LLM-annotated data. Extensive experiments were conducted on various tasks including short-text STS, long-text STS, and domain-specific STS tasks. The results show that AnglE outperforms the state-of-the-art (SOTA) STS models that ignore the cosine saturation zone. These findings demonstrate the ability of AnglE to generate high-quality text embeddings and the usefulness of angle optimization in STS.

What problem does this paper attempt to address?

The paper primarily addresses the issue of gradient vanishing in text embedding, particularly the saturation problem caused by using cosine similarity as an optimization target. It proposes a new angle-optimized text embedding model called AnglE. The core contributions and problems solved by the paper can be summarized as follows: 1. **Problem Background**: High-quality text embeddings are crucial for Semantic Textual Similarity (STS) tasks, which are foundational for the application of Large Language Models (LLMs). Current text embedding models mostly rely on the cosine function to measure similarity between texts. However, due to the saturation regions of the cosine function, where gradients approach zero, the model struggles to learn subtle distinctions, thereby affecting the optimization process. 2. **Proposed Method**: To address the aforementioned issue, the paper introduces a new model named AnglE, which incorporates the concept of angle optimization in the complex space to mitigate the negative impact of the cosine function's saturation regions on the learning process. Specifically, AnglE divides text embeddings into real and imaginary parts and optimizes their relationship by calculating the angle difference between two text embeddings. This approach not only optimizes the cosine similarity between texts but also their angular differences. 3. **Experimental Evaluation**: To validate the effectiveness of the proposed model, the authors conducted extensive experimental evaluations. They used existing short-text STS datasets and also constructed a new long-text STS dataset—the GitHub Issues Similarity Dataset. Additionally, the paper explores domain-specific STS scenarios and the model's performance with limited annotated data. 4. **Main Contributions**: - Investigated the negative impact of the cosine function's saturation regions on text similarity tasks and proposed a novel angle-optimized text embedding model to alleviate this issue. - Extended existing STS benchmark datasets by adding a high-quality long-text STS dataset collected from GitHub Issues, making the model performance evaluation more comprehensive. - Conducted extensive experiments on various STS tasks, demonstrating that AnglE significantly improves the quality of text embeddings and performs excellently in different scenarios. In summary, this paper aims to solve the learning difficulty caused by the saturation regions of the cosine function in existing text embedding models. By introducing an angle optimization method, it improves the quality of text embeddings and demonstrates the effectiveness and superiority of the proposed model through experiments.

AnglE-optimized Text Embeddings

OssCSE: Overcoming Surface Structure Bias in Contrastive Learning for Unsupervised Sentence Embedding

Enhancing Embedding Performance through Large Language Model-based Text Enrichment and Rewriting

Improving General Text Embedding Model: Tackling Task Conflict and Data Imbalance through Model Merging

Improving Text Embeddings with Large Language Models

Towards Robust Text Retrieval with Progressive Learning

Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning

LLM-Assisted Content Conditional Debiasing for Fair Text Embedding

Meta-Task Prompting Elicits Embeddings from Large Language Models

Word Embeddings Are Steers for Language Models

Making Text Embedders Few-Shot Learners

Textual Aesthetics in Large Language Models

Improving embedding with contrastive fine-tuning on small datasets with expert-augmented scores

Recent advances in text embedding: A Comprehensive Review of Top-Performing Methods on the MTEB Benchmark

Adaptive Optimization for Enhanced Efficiency in Large-Scale Language Model Training

Data Efficient Evaluation of Large Language Models and Text-to-Image Models via Adaptive Sampling

A Text is Worth Several Tokens: Text Embedding from LLMs Secretly Aligns Well with The Key Tokens

ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation

GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning

A semantically enhanced dual encoder for aspect sentiment triplet extraction

Domain-specific meta-embedding with latent semantic structures