Abstract:The statistical regularities in language corpora encode well-known social biases into word embeddings. Here, we focus on gender to provide a comprehensive analysis of group-based biases in widely-used static English word embeddings trained on internet corpora (GloVe 2014, fastText 2017). Using the Single-Category Word Embedding Association Test, we demonstrate the widespread prevalence of gender biases that also show differences in: (1) frequencies of words associated with men versus women; (b) part-of-speech tags in gender-associated words; (c) semantic categories in gender-associated words; and (d) valence, arousal, and dominance in gender-associated words. First, in terms of word frequency: we find that, of the 1,000 most frequent words in the vocabulary, 77% are more associated with men than women, providing direct evidence of a masculine default in the everyday language of the English-speaking world. Second, turning to parts-of-speech: the top male-associated words are typically verbs (e.g., fight, overpower) while the top female-associated words are typically adjectives and adverbs (e.g., giving, emotionally). Gender biases in embeddings also permeate parts-of-speech. Third, for semantic categories: bottom-up, cluster analyses of the top 1,000 words associated with each gender. The top male-associated concepts include roles and domains of big tech, engineering, religion, sports, and violence; in contrast, the top female-associated concepts are less focused on roles, including, instead, female-specific slurs and sexual content, as well as appearance and kitchen terms. Fourth, using human ratings of word valence, arousal, and dominance from a ~20,000 word lexicon, we find that male-associated words are higher on arousal and dominance, while female-associated words are higher on valence.

Grammatical gender associations outweigh topical gender bias in crosslinguistic word embeddings

Examining Gender Bias in Languages with Grammatical Gender

Gender Bias in Word Embeddings: A Comprehensive Analysis of Frequency, Syntax, and Semantics

Measuring Gender Bias in Word Embeddings of Gendered Languages Requires Disentangling Grammatical Gender Signals

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

Mitigating Gender Bias in Contextual Word Embeddings

Grammatical Gender, Neo-Whorfianism, and Word Embeddings: A Data-Driven Approach to Linguistic Relativity

A World Full of Stereotypes? Further Investigation on Origin and Gender Bias in Multi-Lingual Word Embeddings

How does Grammatical Gender Affect Noun Representations in Gender-Marking Languages?

Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer

Gender Stereotypes in Natural Language: Word Embeddings Show Robust Consistency Across Child and Adult Language Corpora of More Than 65 Million Words

Gender Bias in Contextualized Word Embeddings

Unmasking Contextual Stereotypes: Measuring and Mitigating BERT's Gender Bias

Examining Covert Gender Bias: A Case Study in Turkish and English Machine Translation Models

Robustness and Reliability of Gender Bias Assessment in Word Embeddings: The Role of Base Pairs

Analysis of Gender Bias in Social Perception and Judgement Using Chinese Word Embeddings

Interpreting Gender Bias in Neural Machine Translation: Multilingual Architecture Matters

Gauging the Impact of Gender Grammaticization in Different Languages: Application of a Linguistic-Visual Paradigm

Gender Inflected or Bias Inflicted: On Using Grammatical Gender Cues for Bias Evaluation in Machine Translation

Investigating Gender Bias in Turkish Language Models

Attenuating Bias in Word Vectors