Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning

Trapoom Ukarapol,Zhicheng Lee,Amy Xin

2024-08-02

Abstract:While Large Language Models show remarkable performance in natural language understanding, their resource-intensive nature makes them less accessible. In contrast, smaller language models such as MiniCPM offer more sustainable scalability, but often underperform without specialized optimization. In this paper, we explore the enhancement of smaller language models through the improvement of their text embeddings. We select three language models, MiniCPM, Phi-2, and Gemma, to conduct contrastive fine-tuning on the NLI dataset. Our results demonstrate that this fine-tuning method enhances the quality of text embeddings for all three models across various benchmarks, with MiniCPM showing the most significant improvements of an average 56.33% performance gain. The contrastive fine-tuning code is publicly available at <a class="link-external link-https" href="https://github.com/trapoom555/Language-Model-STS-CFT" rel="external noopener nofollow">this https URL</a>.

Computation and Language

What problem does this paper attempt to address?

The paper is primarily dedicated to addressing the performance deficiencies of small language models in text embedding. Specifically, the researchers focus on how to improve the text embedding capabilities of these models to enhance their performance in natural language understanding tasks, especially for resource-constrained application scenarios. The core contribution of the paper is the proposal of a contrastive fine-tuning method to enhance the text embedding quality of smaller language models. The study selected three small language models—MiniCPM, Phi-2, and Gemma—and conducted contrastive fine-tuning experiments on natural language inference (NLI) datasets. The results show that this method can significantly improve the performance of these models on various benchmark tests. Notably, MiniCPM achieved the best results in all tests, with an average performance improvement of 56.33%. Additionally, the paper conducted several ablation studies, including exploring the impact of different learning rates, the effectiveness of prompt techniques, training data efficiency, and the role of incorporating hard negative sample penalties in the objective function. These studies further validate the effectiveness of the proposed contrastive fine-tuning method and reveal some interesting phenomena, such as the observation that additional prompts may not bring the expected performance improvement for already fine-tuned models. In summary, this research aims to improve the performance of small language models in text embedding tasks through contrastive fine-tuning techniques, making them a more attractive option for resource-constrained scenarios.

Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning

Cross-model Control: Improving Multiple Large Language Models in One-time Training

Refining Joint Text and Source Code Embeddings for Retrieval Task with Parameter-Efficient Fine-Tuning

Improving embedding with contrastive fine-tuning on small datasets with expert-augmented scores

Improving Text Embeddings with Large Language Models

Enhancing Embedding Performance through Large Language Model-based Text Enrichment and Rewriting

MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

Empirical Analysis of Efficient Fine-Tuning Methods for Large Pre-Trained Language Models

Exploring Continual Fine-Tuning for Enhancing Language Ability in Large Language Model

Evaluating Large Language Models Using Contrast Sets: An Experimental Approach

Finetuning CLIP to Reason about Pairwise Differences

Improving General Text Embedding Model: Tackling Task Conflict and Data Imbalance through Model Merging

From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data

Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

Super Tiny Language Models

CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language Models

LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning

Enhancing SLM via ChatGPT and Dataset Augmentation

Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models

Small Language Models Improve Giants by Rewriting Their Outputs

An Emulator for Fine-Tuning Large Language Models using Small Language Models