Word Embeddings Are Steers for Language Models

Chi Han,Jialiang Xu,Manling Li,Yi Fung,Chenkai Sun,Nan Jiang,Tarek Abdelzaher,Heng Ji

2024-06-06

Abstract:Language models (LMs) automatically learn word embeddings during pre-training on language corpora. Although word embeddings are usually interpreted as feature vectors for individual words, their roles in language model generation remain underexplored. In this work, we theoretically and empirically revisit output word embeddings and find that their linear transformations are equivalent to steering language model generation styles. We name such steers LM-Steers and find them existing in LMs of all sizes. It requires learning parameters equal to 0.2% of the original LMs' size for steering each style. On tasks such as language model detoxification and sentiment control, LM-Steers can achieve comparable or superior performance compared with state-of-the-art controlled generation methods while maintaining a better balance with generation quality. The learned LM-Steer serves as a lens in text styles: it reveals that word embeddings are interpretable when associated with language model generations and can highlight text spans that most indicate the style differences. An LM-Steer is transferrable between different language models by an explicit form calculation. One can also continuously steer LMs simply by scaling the LM-Steer or compose multiple LM-Steers by adding their transformations. Our codes are publicly available at \url{<a class="link-external link-https" href="https://github.com/Glaciohound/LM-Steer" rel="external noopener nofollow">this https URL</a>}.

Computation and Language,Artificial Intelligence,Machine Learning

What problem does this paper attempt to address?

### Problems Addressed by the Paper The paper primarily explores the role of word embeddings in language models (LMs) during the generation process and proposes a simple yet effective method—**LM-Steer**—to control the style of language model generation. 1. **Role of Word Embeddings**: - The authors found that word embeddings are not just feature vectors for individual words; they are also closely related to the style of language model generation. - Linear transformation of word embeddings can change the style of language model generation. 2. **LM-Steer Method**: - LM-Steer adjusts word embeddings through simple linear transformations, enabling flexible control over the style of language model generation. - This method requires learning only 0.2% of the original language model's parameters to achieve style control. - LM-Steer performs well in tasks such as detoxification and sentiment control. 3. **Application Cases**: - **Detoxification**: Reducing the generation of toxic content. - **Sentiment Control**: Controlling the emotional inclination of the generated text. - **Continuous and Combined Control**: Supporting continuous and combined style control. 4. **Efficiency Advantages**: - Data Efficient: Requires only a small amount of data to train an effective LM-Steer. - Parameter Efficient: Requires only a small portion of the original language model's parameters. - Decoding Efficient: Decoding speed is close to that of the original language model. Through the above methods, LM-Steer not only achieves effective control over the style of language model generation but also provides interpretative analysis of word embeddings.

Word Embeddings Are Steers for Language Models

Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization

LLMs are Also Effective Embedding Models: An In-depth Overview

Style Vectors for Steering Generative Large Language Model

Word Embeddings Revisited: Do LLMs Offer Something New?

Jointly Learning Word Embeddings and Latent Topics

Embedding-Aligned Language Models

An Adaptive Wordpiece Language Model For Learning Chinese Word Embeddings

CogSteer: Cognition-Inspired Selective Layer Intervention for Efficient Semantic Steering in Large Language Models

How to Generate a Good Word Embedding?

Representation Of Lexical Stylistic Features In Language Models' Embedding Space

DESTEIN: Navigating Detoxification of Language Models via Universal Steering Pairs and Head-wise Activation Fusion

Improve Word Embedding Using Both Writing and Pronunciation.

Do Word Embeddings Really Understand Loughran-McDonald's Polarities?

From Word Vectors to Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models

Simple Techniques for Enhancing Sentence Embeddings in Generative Language Models

EmbedLLM: Learning Compact Representations of Large Language Models

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Improving Text Embeddings with Large Language Models

A Text is Worth Several Tokens: Text Embedding from LLMs Secretly Aligns Well with The Key Tokens