Smi Hinterreiter,Timo Spinde,Sebastian Oberdörfer,Isao Echizen,Marc Erich Latoschik
Abstract:Recent research shows that visualizing linguistic bias mitigates its negative effects. However, reliable automatic detection methods to generate such visualizations require costly, knowledge-intensive training data. To facilitate data collection for media bias datasets, we present News Ninja, a game employing data-collecting game mechanics to generate a crowdsourced dataset. Before annotating sentences, players are educated on media bias via a tutorial. Our findings show that datasets gathered with crowdsourced workers trained on News Ninja can reach significantly higher inter-annotator agreements than expert and crowdsourced datasets with similar data quality. As News Ninja encourages continuous play, it allows datasets to adapt to the reception and contextualization of news over time, presenting a promising strategy to reduce data collection expenses, educate players, and promote long-term bias mitigation.
What problem does this paper attempt to address?
The paper aims to address the challenge of automatic detection of linguistic bias in online news, especially the cost and complexity of creating large-scale, high-quality training datasets. The authors have introduced a gamified annotation tool called "News Ninja," designed to collect crowdsourced data through gaming mechanics to generate a dataset on media bias. Before starting the annotation task, players will be educated about linguistic bias through a tutorial, which helps to enhance their ability to recognize bias. The researchers are particularly focused on the following three research questions:
1. How can knowledge about linguistic bias be conveyed in an interactive and gamified manner?
2. How can gaming mechanics facilitate the annotation task?
3. Can the data generated by players achieve results comparable to those generated by experts?
News Ninja has designed a series of game modes, including Publish mode and Critique mode, to help players identify and tag biases in the text. In Publish mode, players need to swipe or tap the screen to mark whether a sentence is biased and can optionally tag specific words within the sentence. In Critique mode, players can agree or disagree with the annotations made by others.
The paper notes that manual annotation of bias, although accurate when performed by experts, is costly, and current Natural Language Processing (NLP) technology has not yet reached a high enough performance level to meet the needs of end-user solutions. Therefore, gamified annotation becomes a promising strategy, which not only reduces the cost of data collection but also educates players, promoting long-term bias mitigation. News Ninja, through continuous gameplay, enables the dataset to adapt to the reception and contextualization of news over time.
By comparing the data collected by News Ninja with that generated by experts, the study found that the labels generated by players improved inter-annotator consistency by 10.28% over the baseline dataset, showing that News Ninja is a promising method for crowdsourcing expert-level linguistic bias labels. Additionally, News Ninja introduced a graphical user interface, tutorials, five game modes, a feedback system, and a delayed feedback mechanism to maintain player engagement, even in the absence of ground truth values.
In summary, the main contribution of the paper is the proposal of an innovative gamified approach to address the data collection challenge in linguistic bias detection, while also raising public awareness and skills in bias detection. By combining game design, education, and annotation tasks, News Ninja provides a sustainable solution for dataset creation in the field of Natural Language Processing.