Wikigender: A Machine Learning Model to Detect Gender Bias in Wikipedia

Natalie Bolón Brun,Sofia Kypraiou,Natalia Gullón Altés,Irene Petlacalco Barrios
DOI: https://doi.org/10.48550/arXiv.2211.07520
2022-11-15
Abstract:The way Wikipedia's contributors think can influence how they describe individuals resulting in a bias based on gender. We use a machine learning model to prove that there is a difference in how women and men are portrayed on Wikipedia. Additionally, we use the results of the model to obtain which words create bias in the overview of the biographies of the English Wikipedia. Using only adjectives as input to the model, we show that the adjectives used to portray women have a higher subjectivity than the ones used to describe men. Extracting topics from the overview using nouns and adjectives as input to the model, we obtain that women are related to family while men are related to business and sports.
Computers and Society
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the existence and manifestation forms of gender bias in Wikipedia. Specifically, the authors use machine - learning models to prove that there are differences in the ways of describing men and women in Wikipedia, and reveal such differences by analyzing the vocabulary selection in articles. The paper mainly focuses on the following aspects: 1. **Detection of gender bias**: Predict the gender of the person described in the biography through a machine - learning model, so as to detect whether there are differences in gender - based language use. 2. **Analysis of vocabulary selection**: Pay special attention to the use of adjectives, analyze the subjectivity and emotional tendency of these adjectives to reveal the language differences between biographies of different genders. 3. **Analysis of topic bias**: Extract the topics in the overview part of the biography, analyze the common topics in biographies of different genders. For example, female biographies are more related to family, while male biographies are more related to business and sports. 4. **Bias in professional fields**: Further analyze the gender bias within different professional fields, and explore whether some professions show more obvious gender bias than others. Through these analyses, the authors aim to reveal the specific manifestations of gender bias in Wikipedia and provide data support and methodological basis for further reducing this bias.