AnglE-optimized Text Embeddings

Xianming Li,Jing Li
2024-05-16
Abstract:High-quality text embedding is pivotal in improving semantic textual similarity (STS) tasks, which are crucial components in Large Language Model (LLM) applications. However, a common challenge existing text embedding models face is the problem of vanishing gradients, primarily due to their reliance on the cosine function in the optimization objective, which has saturation zones. To address this issue, this paper proposes a novel angle-optimized text embedding model called AnglE. The core idea of AnglE is to introduce angle optimization in a complex space. This novel approach effectively mitigates the adverse effects of the saturation zone in the cosine function, which can impede gradient and hinder optimization processes. To set up a comprehensive STS evaluation, we experimented on existing short-text STS datasets and a newly collected long-text STS dataset from GitHub Issues. Furthermore, we examine domain-specific STS scenarios with limited labeled data and explore how AnglE works with LLM-annotated data. Extensive experiments were conducted on various tasks including short-text STS, long-text STS, and domain-specific STS tasks. The results show that AnglE outperforms the state-of-the-art (SOTA) STS models that ignore the cosine saturation zone. These findings demonstrate the ability of AnglE to generate high-quality text embeddings and the usefulness of angle optimization in STS.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper primarily addresses the issue of gradient vanishing in text embedding, particularly the saturation problem caused by using cosine similarity as an optimization target. It proposes a new angle-optimized text embedding model called AnglE. The core contributions and problems solved by the paper can be summarized as follows: 1. **Problem Background**: High-quality text embeddings are crucial for Semantic Textual Similarity (STS) tasks, which are foundational for the application of Large Language Models (LLMs). Current text embedding models mostly rely on the cosine function to measure similarity between texts. However, due to the saturation regions of the cosine function, where gradients approach zero, the model struggles to learn subtle distinctions, thereby affecting the optimization process. 2. **Proposed Method**: To address the aforementioned issue, the paper introduces a new model named AnglE, which incorporates the concept of angle optimization in the complex space to mitigate the negative impact of the cosine function's saturation regions on the learning process. Specifically, AnglE divides text embeddings into real and imaginary parts and optimizes their relationship by calculating the angle difference between two text embeddings. This approach not only optimizes the cosine similarity between texts but also their angular differences. 3. **Experimental Evaluation**: To validate the effectiveness of the proposed model, the authors conducted extensive experimental evaluations. They used existing short-text STS datasets and also constructed a new long-text STS dataset—the GitHub Issues Similarity Dataset. Additionally, the paper explores domain-specific STS scenarios and the model's performance with limited annotated data. 4. **Main Contributions**: - Investigated the negative impact of the cosine function's saturation regions on text similarity tasks and proposed a novel angle-optimized text embedding model to alleviate this issue. - Extended existing STS benchmark datasets by adding a high-quality long-text STS dataset collected from GitHub Issues, making the model performance evaluation more comprehensive. - Conducted extensive experiments on various STS tasks, demonstrating that AnglE significantly improves the quality of text embeddings and performs excellently in different scenarios. In summary, this paper aims to solve the learning difficulty caused by the saturation regions of the cosine function in existing text embedding models. By introducing an angle optimization method, it improves the quality of text embeddings and demonstrates the effectiveness and superiority of the proposed model through experiments.