Large Language Models for Propaganda Span Annotation

Maram Hasanain,Fatema Ahmad,Firoj Alam

2024-10-06

Abstract:The use of propagandistic techniques in online content has increased in recent years aiming to manipulate online audiences. Fine-grained propaganda detection and extraction of textual spans where propaganda techniques are used, are essential for more informed content consumption. Automatic systems targeting the task over lower resourced languages are limited, usually obstructed by lack of large scale training datasets. Our study investigates whether Large Language Models (LLMs), such as GPT-4, can effectively extract propagandistic spans. We further study the potential of employing the model to collect more cost-effective annotations. Finally, we examine the effectiveness of labels provided by GPT-4 in training smaller language models for the task. The experiments are performed over a large-scale in-house manually annotated dataset. The results suggest that providing more annotation context to GPT-4 within prompts improves its performance compared to human annotators. Moreover, when serving as an expert annotator (consolidator), the model provides labels that have higher agreement with expert annotators, and lead to specialized models that achieve state-of-the-art over an unseen Arabic testing set. Finally, our work is the first to show the potential of utilizing LLMs to develop annotated datasets for propagandistic spans detection task prompting it with annotations from human annotators with limited expertise. All scripts and annotations will be shared with the community.

Computation and Language

What problem does this paper attempt to address?

The paper attempts to address the problem of automatic detection and extraction of propaganda techniques in online content. Specifically, the authors focus on fine-grained propaganda detection at the text fragment level and investigate the performance of large language models (such as GPT-4) in this task. The main issues addressed in the paper are as follows: 1. **Automatic Detection and Extraction of Propaganda Techniques**: - The use of propaganda techniques in online content is increasing, aiming to manipulate online audiences. Therefore, fine-grained detection and extraction of propaganda techniques from text fragments are crucial for more informed content consumption. 2. **Development of Automatic Systems for Low-Resource Languages**: - For low-resource languages, the development of automatic systems is often limited by the lack of large-scale training datasets. The paper explores whether large language models (such as GPT-4) can effectively extract propaganda fragments and be used to collect more cost-effective annotations. 3. **Using GPT-4 Generated Labels to Train Smaller Language Models**: - The paper further investigates the effectiveness of labels provided by GPT-4 in training smaller language models, particularly their performance on an Arabic test set. Through these issues, the authors aim to explore the potential of large language models in propaganda technique detection and annotation, and how these models can be leveraged to reduce the cost and effort of manual annotation.

Large Language Models for Propaganda Span Annotation

Large Language Models for Propaganda Detection

Can GPT-4 Identify Propaganda? Annotation and Detection of Propaganda Spans in News Articles

Large Language Models for Multi-label Propaganda Detection

GPT Assisted Annotation of Rhetorical and Linguistic Features for Interpretable Propaganda Technique Detection in News Text

Large Language Models Reveal Information Operation Goals, Tactics, and Narrative Frames

Large Language Models for Data Annotation and Synthesis: A Survey

Large Language Models for Data Annotation: A Survey

Measuring and Benchmarking Large Language Models' Capabilities to Generate Persuasive Language

Watch Your Language: Investigating Content Moderation with Large Language Models

Large Language Models as Financial Data Annotators: A Study on Effectiveness and Efficiency

Munir: Weakly Supervised Transformer for Arabic Computational Propaganda Detection on Social Media

Advancing Annotation of Stance in Social Media Posts: A Comparative Analysis of Large Language Models and Crowd Sourcing

Perils and opportunities in using large language models in psychological research

Investigating Annotator Bias in Large Language Models for Hate Speech Detection

Large Language Model Soft Ideologization via AI-Self-Consciousness

Evaluating Large Language Models in Analysing Classroom Dialogue

Large Language Models for Automatic Detection of Sensitive Topics

An Investigation of Large Language Models for Real-World Hate Speech Detection

Keeping Humans in the Loop: Human-Centered Automated Annotation with Generative AI

AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators