GPT Assisted Annotation of Rhetorical and Linguistic Features for Interpretable Propaganda Technique Detection in News Text

Kyle Hamilton,Luca Longo,Bojan Bozic

DOI: https://doi.org/10.1145/3589335.3651909

2024-07-16

Abstract:While the use of machine learning for the detection of propaganda techniques in text has garnered considerable attention, most approaches focus on "black-box" solutions with opaque inner workings. Interpretable approaches provide a solution, however, they depend on careful feature engineering and costly expert annotated data. Additionally, language features specific to propagandistic text are generally the focus of rhetoricians or linguists, and there is no data set labeled with such features suitable for machine learning. This study codifies 22 rhetorical and linguistic features identified in literature related to the language of persuasion for the purpose of annotating an existing data set labeled with propaganda techniques. To help human experts annotate natural language sentences with these features, RhetAnn, a web application, was specifically designed to minimize an otherwise considerable mental effort. Finally, a small set of annotated data was used to fine-tune GPT-3.5, a generative large language model (LLM), to annotate the remaining data while optimizing for financial cost and classification accuracy. This study demonstrates how combining a small number of human annotated examples with GPT can be an effective strategy for scaling the annotation process at a fraction of the cost of traditional annotation relying solely on human experts. The results are on par with the best performing model at the time of writing, namely GPT-4, at 10x less the cost. Our contribution is a set of features, their properties, definitions, and examples in a machine-readable format, along with the code for RhetAnn and the GPT prompts and fine-tuning procedures for advancing state-of-the-art interpretable propaganda technique detection.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

The paper aims to address the following issues: 1. **Interpretability Issue**: While the use of machine learning to detect propaganda techniques in news texts has received widespread attention, most methods rely on "black-box" solutions that lack transparency. Interpretable methods require careful feature engineering and expensive manually labeled data. 2. **Language Feature Annotation Issue**: Language features related to propaganda are often of interest to rhetoricians or linguists, but there are no suitable datasets for machine learning to annotate these features. 3. **Efficient Annotation Strategy**: The paper proposes a method that combines a small number of manually labeled samples with a GPT model to significantly reduce the cost of traditional annotation, which relies entirely on human experts, and to improve classification accuracy. Through the above work, the paper demonstrates how to effectively combine a small number of manually labeled samples with GPT to extend the annotation process at a lower cost, achieving results comparable to the state-of-the-art models at the time (such as GPT-4), with only one-tenth of the cost.

GPT Assisted Annotation of Rhetorical and Linguistic Features for Interpretable Propaganda Technique Detection in News Text

Can GPT-4 Identify Propaganda? Annotation and Detection of Propaganda Spans in News Articles

Large Language Models for Propaganda Detection

PropaInsight: Toward Deeper Understanding of Propaganda in Terms of Techniques, Appeals, and Intent

Large Language Models for Propaganda Span Annotation

Exposing propaganda: an analysis of stylistic cues comparing human annotations and machine classification

Prta: A System to Support the Analysis of Propaganda Techniques in the News

Together we can do it! A roadmap to effectively tackle propaganda-related tasks

Large Language Models for Multi-label Propaganda Detection

HQP: A Human-Annotated Dataset for Detecting Online Propaganda

Hierarchical Multi-Instance Multi-Label Learning for Detecting Propaganda Techniques

Automated stance detection in complex topics and small languages: The challenging case of immigration in polarizing news media

Leveraging Declarative Knowledge in Text and First-Order Logic for Fine-Grained Propaganda Detection

G-HFIN: Graph-based Hierarchical Feature Integration Network for propaganda detection of We-media news articles

Closing the Loop: Testing ChatGPT to Generate Model Explanations to Improve Human Labelling of Sponsored Content on Social Media

GPT-4 as an X data annotator: Unraveling its performance on a stance classification task

Discourse Structures Guided Fine-grained Propaganda Identification

HAPI: An efficient Hybrid Feature Engineering-based Approach for Propaganda Identification in social media

GPT is an effective tool for multilingual psychological text analysis

Large Language Models Reveal Information Operation Goals, Tactics, and Narrative Frames

Feature-based detection of automated language models: tackling GPT-2, GPT-3 and Grover