Data-Augmented and Retrieval-Augmented Context Enrichment in Chinese Media Bias Detection

Luyang Lin,Jing Li,Kam-Fai Wong

2023-11-18

Abstract:With the increasing pursuit of objective reports, automatically understanding media bias has drawn more attention in recent research. However, most of the previous work examines media bias from Western ideology, such as the left and right in the political spectrum, which is not applicable to Chinese outlets. Based on the previous lexical bias and informational bias structure, we refine it from the Chinese perspective and go one step further to craft data with 7 fine-grained labels. To be specific, we first construct a dataset with Chinese news reports about COVID-19 which is annotated by our newly designed system, and then conduct substantial experiments on it to detect media bias. However, the scale of the annotated data is not enough for the latest deep-learning technology, and the cost of human annotation in media bias, which needs a lot of professional knowledge, is too expensive. Thus, we explore some context enrichment methods to automatically improve these problems. In Data-Augmented Context Enrichment (DACE), we enlarge the training data; while in Retrieval-Augmented Context Enrichment (RACE), we improve information retrieval methods to select valuable information and integrate it into our models to better understand bias. Extensive experiments are conducted on both our dataset and an English dataset BASIL. Our results show that both methods outperform our baselines, while the RACE methods are more efficient and have more potential.

Computers and Society

What problem does this paper attempt to address?

The paper primarily aims to address the following issues: 1. **Refining Media Bias Annotation Structure**: For the task of Chinese media bias detection, the authors propose a more refined media bias annotation structure and contribute a Chinese health domain media bias dataset with a multi-label setting and its benchmark. 2. **Application of Pre-trained Models in Media Bias Detection**: By fine-tuning pre-trained models using the media bias dataset, the authors comprehensively study the differences between lexical bias and informational bias detection. 3. **Proposing Data Augmentation and Retrieval-Enhanced Context Enrichment Methods**: To address the issues of insufficient annotated data and high manual annotation costs, the authors propose two methods—Data Augmentation Context Enrichment (DACE) and Retrieval-Enhanced Context Enrichment (RACE)—aimed at automatically improving these issues and enhancing the performance of media bias detection tasks. Specifically, the goals of the paper are: - To design a media bias annotation system suitable for the Chinese context, construct the corresponding dataset, and establish a multi-label classification benchmark based on it. - To explore how to fine-tune pre-trained models to effectively detect different types of media bias. - To propose two new context enrichment methods, namely data augmentation and retrieval enhancement, to improve model performance on small-scale datasets and validate the effectiveness of these methods. Through these efforts, the paper aims to advance the technology of automatic detection of Chinese media bias and provide valuable references and tools for research in related fields.

Data-Augmented and Retrieval-Augmented Context Enrichment in Chinese Media Bias Detection

Target-Aware Contextual Political Bias Detection in News

CDAIL-BIAS MEASURER: A Model Ensemble Approach for Dialogue Social Bias Measurement

CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models

NewsUnfold: Creating a News-Reading Application That Indicates Linguistic Media Bias and Collects Feedback

An Interdisciplinary Approach for the Automated Detection and Visualization of Media Bias in News Articles

Intertwined Biases Across Social Media Spheres: Unpacking Correlations in Media Bias Dimensions

Unlocking Bias Detection: Leveraging Transformer-Based Models for Content Analysis

CHBias: Bias Evaluation and Mitigation of Chinese Conversational Language Models

Detecting and Reducing Bias in a High Stakes Domain

Towards Identifying Social Bias in Dialog Systems: Framework, Dataset, and Benchmark

Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and Benchmarks

BiasScanner: Automatic Detection and Classification of News Bias to Strengthen Democracy

Modeling Multi-level Context for Informational Bias Detection by Contrastive Learning and Sentential Graph Network

BiasAlert: A Plug-and-play Tool for Social Bias Detection in LLMs

Adoption and implication of the Biased-Annotator Competence Estimation (BACE) model into COVID-19 vaccine Twitter data: Human annotation for latent message features

HateDebias: On the Diversity and Variability of Hate Speech Debiasing

IndiTag: An Online Media Bias Analysis and Annotation System Using Fine-Grained Bias Indicators

Tackling Data Bias in MUSIC-AVQA: Crafting a Balanced Dataset for Unbiased Question-Answering

Mitigating Social Biases of Pre-trained Language Models via Contrastive Self-Debiasing with Double Data Augmentation

Balancing Transparency and Accuracy: A Comparative Analysis of Rule-Based and Deep Learning Models in Political Bias Classification