Abstract:Recently there has been a growing concern in academia, industrial research laboratories and the mainstream commercial media about the phenomenon dubbed as <em>machine bias</em>, where trained statistical models—unbeknownst to their creators—grow to reflect controversial societal asymmetries, such as gender or racial bias. A significant number of Artificial Intelligence tools have recently been suggested to be harmfully biased toward some minority, with reports of racist criminal behavior predictors, Apple's Iphone X failing to differentiate between two distinct Asian people and the now infamous case of Google photos' mistakenly classifying black people as gorillas. Although a systematic study of such biases can be difficult, we believe that automated translation tools can be exploited through gender neutral languages to yield a window into the phenomenon of gender bias in AI. In this paper, we start with a comprehensive list of job positions from the U.S. Bureau of Labor Statistics (BLS) and used it in order to build sentences in constructions like "He/She is an Engineer" (where "Engineer" is replaced by the job position of interest) in 12 different gender neutral languages such as Hungarian, Chinese, Yoruba, and several others. We translate these sentences into English using the Google Translate API, and collect statistics about the frequency of female, male and gender neutral pronouns in the translated output. We then show that Google Translate exhibits a strong tendency toward male defaults, in particular for fields typically associated to unbalanced gender distribution or stereotypes such as STEM (Science, Technology, Engineering and Mathematics) jobs. We ran these statistics against BLS' data for the frequency of female participation in each job position, in which we show that Google Translate fails to reproduce a real-world distribution of female workers. In summary, we provide experimental evidence that even if one does not expect in principle a 50:50 pronominal gender distribution, Google Translate yields male defaults much more frequently than what would be expected from demographic data alone. We believe that our study can shed further light on the phenomenon of machine bias and are hopeful that it will ignite a debate about the need to augment current statistical translation tools with debiasing techniques—which can already be found in the scientific literature.

How to measure gender bias in machine translation: Real-world oriented machine translators, multiple reference points

How to Measure Gender Bias in Machine Translation: Optimal Translators, Multiple Reference Points

Assessing gender bias in machine translation: a case study with Google Translate

Evaluating Gender Bias in the Translation of Gender-Neutral Languages into English

Investigating Markers and Drivers of Gender Bias in Machine Translations

Evaluating Gender Bias in Machine Translation

On Measuring Gender Bias in Translation of Gender-neutral Pronouns

Extending Challenge Sets to Uncover Gender Bias in Machine Translation: Impact of Stereotypical Verbs and Adjectives

Examining Covert Gender Bias: A Case Study in Turkish and English Machine Translation Models

Gender Bias in Machine Translation

Good, but not always Fair: An Evaluation of Gender Bias for three Commercial Machine Translation Systems

Evaluating Gender Bias in Hindi-English Machine Translation

Gender Inflected or Bias Inflicted: On Using Grammatical Gender Cues for Bias Evaluation in Machine Translation

Gender Bias in Online Language Translators: Visualization, Human Perception, and Bias/Accuracy Tradeoffs

Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words

Machine Translation and Gender biases in video game localisation: a corpus-based analysis

Whose wife is it anyway? Assessing bias against same-gender relationships in machine translation

A Tale of Pronouns: Interpretability Informs Gender Bias Mitigation for Fairer Instruction-Tuned Machine Translation

Watching the Watchers: Exposing Gender Disparities in Machine Translation Quality Estimation

What the Harm? Quantifying the Tangible Impact of Gender Bias in Machine Translation with a Human-centered Study