Generating Gender Alternatives in Machine Translation

Sarthak Garg,Mozhdeh Gheini,Clara Emmanuel,Tatiana Likhomanenko,Qin Gao,Matthias Paulik

2024-07-30

Abstract:Machine translation (MT) systems often translate terms with ambiguous gender (e.g., English term "the nurse") into the gendered form that is most prevalent in the systems' training data (e.g., "enfermera", the Spanish term for a female nurse). This often reflects and perpetuates harmful stereotypes present in society. With MT user interfaces in mind that allow for resolving gender ambiguity in a frictionless manner, we study the problem of generating all grammatically correct gendered translation alternatives. We open source train and test datasets for five language pairs and establish benchmarks for this task. Our key technical contribution is a novel semi-supervised solution for generating alternatives that integrates seamlessly with standard MT models and maintains high performance without requiring additional components or increasing inference overhead.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

The paper primarily focuses on the issues arising in machine translation systems when dealing with gender-ambiguous vocabulary and proposes a novel method to generate all grammatically correct, gender-marked translation options. Specifically, the paper aims to address the following issues: 1. **Background and Motivation**: Current Machine Translation (MT) systems often translate gender-ambiguous terms (e.g., "nurse" in English) based on the most common form in the training data (e.g., translating "nurse" to "enfermera," which denotes a female nurse in Spanish). This approach not only reflects harmful stereotypes present in society but may also exacerbate these biases. 2. **Research Objectives**: To avoid making incorrect gender assignments when the context does not clarify the correct gender, the paper aims to develop a method capable of generating all valid and grammatically correct gendered translation options. This means that when encountering gender-ambiguous entities, the system should provide multiple translation versions to cover all possible gender choices. 3. **Technical Contributions**: The main technical contribution of the paper is a novel semi-supervised solution that can be seamlessly integrated into standard machine translation models, maintaining high performance without requiring additional components or increasing inference overhead. This method generates gendered translation options at the entity level rather than the traditional sentence level, ensuring that each entity has corresponding gender choices. 4. **Datasets and Benchmarks**: To support this research, the authors have open-sourced training and testing datasets for 5 language pairs and established benchmarks for this task. Additionally, they have expanded an existing test set from 3 language pairs to 6 language pairs. 5. **Quality Standards**: The paper defines several quality standards, including generating options only when necessary, ensuring all options maintain grammatical gender consistency, and ensuring that differences between options are limited to gender inflection changes. In summary, the paper aims to improve the performance of machine translation systems in handling gender ambiguity by proposing a new method that provides multiple translation options to cover all possible gender choices.

Generating Gender Alternatives in Machine Translation

Evaluating Gender Bias in the Translation of Gender-Neutral Languages into English

Building Bridges: A Dataset for Evaluating Gender-Fair Machine Translation into German

Evaluating Gender Bias in Machine Translation

Gender Bias in Machine Translation

GATE: A Challenge Set for Gender-Ambiguous Translation Examples

Gender Neutralization for an Inclusive Machine Translation: from Theoretical Foundations to Open Challenges

Generating Gender Augmented Data for NLP

A Tale of Pronouns: Interpretability Informs Gender Bias Mitigation for Fairer Instruction-Tuned Machine Translation

Getting Gender Right in Neural Machine Translation

Hi Guys or Hi Folks? Benchmarking Gender-Neutral Machine Translation with the GeNTE Corpus

Reducing Gender Bias in Machine Translation through Counterfactual Data Generation

GATE X-E : A Challenge Set for Gender-Fair Translations from Weakly-Gendered Languages

Extending Challenge Sets to Uncover Gender Bias in Machine Translation: Impact of Stereotypical Verbs and Adjectives

Assessing gender bias in machine translation: a case study with Google Translate

Good, but not always Fair: An Evaluation of Gender Bias for three Commercial Machine Translation Systems

Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words

Decoding and Diversity in Machine Translation

Participatory Research as a Path to Community-Informed, Gender-Fair Machine Translation

A Prompt Response to the Demand for Automatic Gender-Neutral Translation

Examining Covert Gender Bias: A Case Study in Turkish and English Machine Translation Models