A Multilingual Perspective on Probing Gender Bias

Karolina Stańczak
2024-03-16
Abstract:Gender bias represents a form of systematic negative treatment that targets individuals based on their gender. This discrimination can range from subtle sexist remarks and gendered stereotypes to outright hate speech. Prior research has revealed that ignoring online abuse not only affects the individuals targeted but also has broader societal implications. These consequences extend to the discouragement of women's engagement and visibility within public spheres, thereby reinforcing gender inequality. This thesis investigates the nuances of how gender bias is expressed through language and within language technologies. Significantly, this thesis expands research on gender bias to multilingual contexts, emphasising the importance of a multilingual and multicultural perspective in understanding societal biases. In this thesis, I adopt an interdisciplinary approach, bridging natural language processing with other disciplines such as political science and history, to probe gender bias in natural language and language models.
Computation and Language
What problem does this paper attempt to address?
The paper primarily explores the manifestation and measurement methods of gender bias in natural language and language technologies, emphasizing the importance of studying gender bias in a multilingual context. The author, Karolina Stańczak, through her doctoral dissertation, investigates how gender bias is expressed through language and reflected in language technologies, with a particular focus on multilingual and multicultural perspectives. Key contributions of the paper include: 1. **Extending the study of gender bias to multilingual environments**: The paper broadens the scope of gender bias research beyond English to include multiple language environments. This helps in better understanding social biases across different cultural backgrounds. 2. **Constructing datasets for analysis**: The author created datasets from various domains (such as social media data, historical newspapers, etc.) to analyze gender bias. 3. **Introducing new bias measurement methods**: The paper introduces intersectional gender bias measurement methods and conducts causal research on how the grammatical gender of nouns affects people's perceptions of those nouns. 4. **Developing new methods for probing language models**: The author proposes some novel methods to probe language models for linguistic information and social biases. Specifically, in the field of natural language processing, the main contributions of the paper are as follows: - Creating datasets based on different domains, such as social media data and historical newspapers, to analyze gender bias. - Proposing intersectional gender bias measurement methods and conducting a causal study on the impact of noun grammatical gender on people's cognition. - In terms of language model probing methods, proposing two different dataset creation methods: one using simple template structures to generate words directly adjacent to entity names to measure the association between language models and these entities; the other collecting a series of stereotypes and identities belonging to different social categories to form a probing dataset, analyzing the association between language models and social groups and their internal identities. In summary, this paper makes significant contributions to understanding the forms and extent of gender bias in natural language and language models and promotes the development of research methods in this field.