Investigating Gender Bias in Turkish Language Models

Orhun Caglidil,Malte Ostendorff,Georg Rehm
2024-04-18
Abstract:Language models are trained mostly on Web data, which often contains social stereotypes and biases that the models can inherit. This has potentially negative consequences, as models can amplify these biases in downstream tasks or applications. However, prior research has primarily focused on the English language, especially in the context of gender bias. In particular, grammatically gender-neutral languages such as Turkish are underexplored despite representing different linguistic properties to language models with possibly different effects on biases. In this paper, we fill this research gap and investigate the significance of gender bias in Turkish language models. We build upon existing bias evaluation frameworks and extend them to the Turkish language by translating existing English tests and creating new ones designed to measure gender bias in the context of Türkiye. Specifically, we also evaluate Turkish language models for their embedded ethnic bias toward Kurdish people. Based on the experimental results, we attribute possible biases to different model characteristics such as the model size, their multilingualism, and the training corpora. We make the Turkish gender bias dataset publicly available.
Computation and Language
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve This paper primarily explores the issue of gender bias in Turkish language models and attempts to fill the current research gap regarding gender bias in languages without grammatical gender, such as Turkish. #### Specific Objectives 1. **Assess Gender Bias**: The paper extends existing bias assessment frameworks to Turkish to measure gender bias in language models. 2. **Detect Ethnic Bias**: Additionally, it evaluates the ethnic bias against Kurds in Turkish models. 3. **Impact of Model Characteristics**: Based on experimental results, potential biases are attributed to different model characteristics, such as model size, multilingualism, and training corpora. #### Research Methods 1. **Assessment Framework**: - Evaluations are conducted using the Word Embeddings Association Test (WEAT) and the Sentence Encoder Association Test (SEAT). - English test data is translated into Turkish and modified according to local context. 2. **Empirical Analysis**: - Various monolingual and multilingual Turkish models are evaluated, including BERTurk, mBERT, and mT5. - The performance of different models in terms of gender bias and profession-related bias is compared. #### Main Findings 1. **Gender Bias**: - Sentence-level tests reveal gender bias more effectively than word-level tests. - Monolingual models show more bias associations in name tests, whereas multilingual models show the opposite. 2. **Ethnic Bias**: - No significant results were found in the association tests between Turkish and Kurdish names with pleasant/unpleasant attributes, possibly due to the insufficient frequency of Kurdish names in the corpus. 3. **Impact of Model Characteristics**: - Monolingual models seem to exhibit "less bias" characteristics compared to multilingual models, but multilingual models can mitigate some biases through joint training in multiple languages. #### Conclusion This paper reveals the issues of gender and ethnic bias in Turkish language models through systematic evaluation methods and proposes some improvements and bias mitigation strategies. These findings contribute to a better understanding of the performance of language models in different linguistic environments and provide valuable references for future research.