Offensiveness Analysis of Chinese Group Addressing Terms and Dataset Construction.

Shucheng Zhu,Ying Liu
DOI: https://doi.org/10.1007/978-981-97-0586-3_27
2024-01-01
Abstract:Group addressing terms are a linguistic phenomenon commonly used to reference groups in everyday speech. These terms not only reflect the cultural nuances within a language but also serve as valuable keywords in natural language processing for examining various instances of bias and discrimination against disadvantaged groups in artificial intelligence. This paper presents a comprehensive Chinese group addressing terms dataset, constructed by collecting and annotating 2,483 such terms from diverse sources. The dataset encompasses 10 categories, including gender, race, and religion. Subsequently, the offensiveness of these group addressing terms is annotated through a combination of expert evaluations and crowdsourcing. In general, factors such as gender, age, educational background, and empathy do not exhibit a significant correlation with the perception of offensive group addressing terms. However, there are discernible differences in the perception of offensiveness when individuals evaluate terms that relate to their own respective groups. Offensiveness in group addressing terms shows both commonalities across different categories and distinctive characteristics unique to specific categories. Various linguistic traits can either amplify or diminish the perceived offensiveness. Beyond serving as a means of catharsis, offensive group addressing terms can also play a role in identity construction. When different group addressing terms are used as prompts, the text generated by language models reveals certain biases and stereotypes towards particular groups. In the future, this dataset can be leveraged not only for sociolinguistic research but also for the creation of fairness datasets in the field of natural language processing.
What problem does this paper attempt to address?