Abstract:Language models are trained on large-scale corpora that embed implicit biases documented in psychology. Valence associations (pleasantness/unpleasantness) of social groups determine the biased attitudes towards groups and concepts in social cognition. Building on this established literature, we quantify how social groups are valenced in English language models using a sentence template that provides an intersectional context. We study biases related to age, education, gender, height, intelligence, literacy, race, religion, sex, sexual orientation, social class, and weight. We present a concept projection approach to capture the valence subspace through contextualized word embeddings of language models. Adapting the projection-based approach to embedding association tests that quantify bias, we find that language models exhibit the most biased attitudes against gender identity, social class, and sexual orientation signals in language. We find that the largest and better-performing model that we study is also more biased as it effectively captures bias embedded in sociocultural data. We validate the bias evaluation method by overperforming on an intrinsic valence evaluation task. The approach enables us to measure complex intersectional biases as they are known to manifest in the outputs and applications of language models that perpetuate historical biases. Moreover, our approach contributes to design justice as it studies the associations of groups underrepresented in language such as transgender and homosexual individuals.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to evaluate the biased attitude associations of social groups and concepts in a cross - context by language models. Specifically, the paper focuses on the following points: 1. **Quantifying the Sentiment Tendency of Social Groups**: Implicit biases documented in psychology are embedded in language models during the training process. These sentiment associations (pleasantness/unpleasantness) determine the biased attitudes towards different social groups and concepts. The paper quantifies the sentiment tendency of social groups in English language models by providing a cross - context using sentence templates. 2. **Studying the Intersectional Biases of Multiple Social Characteristics**: The paper studies biases related to age, education, gender, height, intelligence, literacy, race, religion, sex, sexual orientation, social class, and weight. The author proposes a concept projection method to capture the sentiment subspace through contextualized word embeddings to measure these biases. 3. **Designing Bias Tests Applicable to Dynamic Representations**: Static word embeddings have been widely studied, but modern natural language processing relies more on contextualized language models. The word embeddings generated by these models are dynamic, so traditional static word embedding bias tests are no longer applicable. The paper proposes a maximum - margin subspace learning method based on support vector classifiers to isolate the sentiment subspace in the contextualized embedding space and measure biases on this basis. 4. **Verifying the Effectiveness of the Bias Evaluation Method**: The author verifies the effectiveness of the proposed bias evaluation method through an internal sentiment evaluation task. In addition, this method can measure complex social intersectional biases that are manifested in the output and application of language models and perpetuate historical biases. 5. **Promoting Design Justice**: By studying the associations of under - represented groups in language such as transgender and homosexual people, this method helps to promote design justice. ### Main Contributions 1. **Maximum - Margin Subspace Learning Method**: A method based on maximum - margin subspace learning is proposed to isolate the sentiment subspace in the contextualized embedding space. This method has been tested in five different language models, and the results show that it is robust in highly contextualized and anisotropic embedding spaces and outperforms methods based on cosine similarity. 2. **Statistical Bias Measurement**: A statistical bias measurement method based on Word Embedding Association Test (WEAT) is introduced to study the differential biases in language models due to the contextualization process. Experimental results show that there are significant biases in gender identity, social class, and sexual orientation. 3. **Bias Research Method without Binary Difference Testing**: A bias research method without binary difference testing is proposed. By generating sentences containing social group signals and calculating the projection product of the embedding representation of the word "person" in each sentence and the maximum - margin subspace, the most pleasant and most unpleasant sentences can be identified. Experimental results once again reflect significant biases related to sexual orientation and gender identity. ### Summary The paper aims to solve the problem of how to evaluate social biases in language models in a cross - context. By introducing new methods and techniques, the author can not only effectively measure the biases in language models but also provide valuable tools and directions for future research.

Evaluating Biased Attitude Associations of Language Models in an Intersectional Context

On the Independence of Association Bias and Empirical Fairness in Language Models

Measuring Implicit Bias in Explicitly Unbiased Large Language Models

Measuring Normative and Descriptive Biases in Language Models Using Census Data

An Analysis of Social Biases Present in BERT Variants Across Multiple Languages

Covert Bias: The Severity of Social Views' Unalignment in Language Models Towards Implicit and Explicit Opinion

Comparing Biases and the Impact of Multilingual Training across Multiple Languages

Detecting Emergent Intersectional Biases: Contextualized Word Embeddings Contain a Distribution of Human-like Biases

Are Models Biased on Text without Gender-related Language?

The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Language Models

FairPair: A Robust Evaluation of Biases in Language Models through Paired Perturbations

Towards Understanding and Mitigating Social Biases in Language Models

Global Voices, Local Biases: Socio-Cultural Prejudices across Languages

Ask LLMs Directly, "What shapes your bias?": Measuring Social Bias in Large Language Models

Wide range screening of algorithmic bias in word embedding models using large sentiment lexicons reveals underreported bias types

Evaluating and Mitigating Social Bias for Large Language Models in Open-ended Settings

Interpreting Bias in Large Language Models: A Feature-Based Approach

Generative Language Models Exhibit Social Identity Biases

Bias and Fairness in Large Language Models: A Survey

BiasDora: Exploring Hidden Biased Associations in Vision-Language Models