Data-Augmented and Retrieval-Augmented Context Enrichment in Chinese Media Bias Detection

Luyang Lin,Jing Li,Kam-Fai Wong
2023-11-18
Abstract:With the increasing pursuit of objective reports, automatically understanding media bias has drawn more attention in recent research. However, most of the previous work examines media bias from Western ideology, such as the left and right in the political spectrum, which is not applicable to Chinese outlets. Based on the previous lexical bias and informational bias structure, we refine it from the Chinese perspective and go one step further to craft data with 7 fine-grained labels. To be specific, we first construct a dataset with Chinese news reports about COVID-19 which is annotated by our newly designed system, and then conduct substantial experiments on it to detect media bias. However, the scale of the annotated data is not enough for the latest deep-learning technology, and the cost of human annotation in media bias, which needs a lot of professional knowledge, is too expensive. Thus, we explore some context enrichment methods to automatically improve these problems. In Data-Augmented Context Enrichment (DACE), we enlarge the training data; while in Retrieval-Augmented Context Enrichment (RACE), we improve information retrieval methods to select valuable information and integrate it into our models to better understand bias. Extensive experiments are conducted on both our dataset and an English dataset BASIL. Our results show that both methods outperform our baselines, while the RACE methods are more efficient and have more potential.
Computers and Society
What problem does this paper attempt to address?
The paper primarily aims to address the following issues: 1. **Refining Media Bias Annotation Structure**: For the task of Chinese media bias detection, the authors propose a more refined media bias annotation structure and contribute a Chinese health domain media bias dataset with a multi-label setting and its benchmark. 2. **Application of Pre-trained Models in Media Bias Detection**: By fine-tuning pre-trained models using the media bias dataset, the authors comprehensively study the differences between lexical bias and informational bias detection. 3. **Proposing Data Augmentation and Retrieval-Enhanced Context Enrichment Methods**: To address the issues of insufficient annotated data and high manual annotation costs, the authors propose two methods—Data Augmentation Context Enrichment (DACE) and Retrieval-Enhanced Context Enrichment (RACE)—aimed at automatically improving these issues and enhancing the performance of media bias detection tasks. Specifically, the goals of the paper are: - To design a media bias annotation system suitable for the Chinese context, construct the corresponding dataset, and establish a multi-label classification benchmark based on it. - To explore how to fine-tune pre-trained models to effectively detect different types of media bias. - To propose two new context enrichment methods, namely data augmentation and retrieval enhancement, to improve model performance on small-scale datasets and validate the effectiveness of these methods. Through these efforts, the paper aims to advance the technology of automatic detection of Chinese media bias and provide valuable references and tools for research in related fields.