Abstract:Abstract Propaganda is a rhetorical technique designed to serve a specific topic, which is often used purposefully in news article to achieve our intended purpose because of its specific psychological effect. Therefore, it is significant to be clear where and what propaganda techniques are used in the news for people to understand its theme efficiently during our daily lives. Recently, some relevant researches are proposed for propaganda detection but unsatisfactorily. As a result, detection of propaganda techniques in news articles is badly in need of research. In this paper, we are going to introduce our systems for detection of propaganda techniques in news articles, which is split into two tasks, Span Identification and Technique Classification. For these two tasks, we design a system based on the popular pretrained BERT model, respectively. Furthermore, we adopt the over-sampling and EDA strategies, propose a sentence-level feature concatenating method in our systems. Experiments on the dataset of about 550 news articles offered by SEMEVAL show that our systems perform state-of-the-art.

What problem does this paper attempt to address?

The paper attempts to address the problem of detecting propaganda techniques in news articles. Specifically, the authors focus on two tasks: 1. **Span Identification**: Determining which parts of the news articles contain propaganda techniques. 2. **Technique Classification**: Identifying the specific propaganda techniques used in these parts. ### Background and Motivation Propaganda techniques are a form of rhetoric often intentionally used in news articles to achieve specific purposes because they have particular psychological effects. Therefore, it is very important for people to clearly understand which propaganda techniques are used and where they are located in news articles to efficiently grasp the article's theme in daily life. However, existing research on propaganda detection is not satisfactory, thus further research is needed to improve detection effectiveness. ### Main Contributions 1. **System Design**: The authors designed two systems based on the pre-trained BERT model, one for span identification and the other for technique classification. 2. **Three-class Method**: The binary span identification task was extended to a three-class task by adding an "invalid" label type to reduce noise and improve accuracy. 3. **Sentence-level Feature Concatenation (SLFC)**: Introduced the method of sentence-level feature concatenation in the span identification system, integrating sentence-level classification features into each word for the first time. 4. **Data Augmentation and Oversampling**: Employed oversampling and EDA (data augmentation) strategies to optimize the dataset, enhancing the model's generalization ability and robustness. ### Experimental Results - **Span Identification Task (SI)**: Through oversampling, EDA, and sentence-level feature concatenation methods, the authors' system achieved an F1 score of 44.1732% on the test set, significantly outperforming the baseline model and other methods. - **Technique Classification Task (TC)**: By using the EDA strategy, the authors' system achieved an F1 score of 57.5729% on the test set, an improvement of about 3% compared to the model without EDA. ### Conclusion and Future Work The authors proposed two specific systems based on the BERT model for span identification and technique classification in news articles. Experimental validation showed that these systems performed excellently in the propaganda detection task and further improved the BERT model. Future work directions include adopting methods from the SpanBERT model, exploring more suitable BERT architectures, and compressing the model to enhance its applicability and convenience.

Span identification and technique classification of propaganda in news articles

SemEval-2020 Task 11: Detection of Propaganda Techniques in News Articles

newsSweeper at SemEval-2020 Task 11: Context-Aware Rich Feature Representations For Propaganda Classification

PropaInsight: Toward Deeper Understanding of Propaganda in Terms of Techniques, Appeals, and Intent

How to detect propaganda from social media? Exploitation of semantic and fine-tuned language models

Hierarchical Multi-Instance Multi-Label Learning for Detecting Propaganda Techniques

Can GPT-4 Identify Propaganda? Annotation and Detection of Propaganda Spans in News Articles

Discourse Structures Guided Fine-grained Propaganda Identification

NoPropaganda at SemEval-2020 Task 11: A Borrowed Approach to Sequence Tagging and Text Classification

LTIatCMU at SemEval-2020 Task 11: Incorporating Multi-Level Features for Multi-Granular Propaganda Span Identification

Exposing propaganda: an analysis of stylistic cues comparing human annotations and machine classification

Prta: A System to Support the Analysis of Propaganda Techniques in the News

BPGC at SemEval-2020 Task 11: Propaganda Detection in News Articles with Multi-Granularity Knowledge Sharing and Linguistic Features based Ensemble Learning

Together we can do it! A roadmap to effectively tackle propaganda-related tasks

Detecting Propaganda Techniques in Memes

Proppy: A System to Unmask Propaganda in Online News

Large Language Models for Multi-label Propaganda Detection

Large Language Models for Propaganda Detection

A systematic comparison of Machine learning and NLP techniques to unveil propaganda in social media

Understanding BERT performance in propaganda analysis

HAPI: An efficient Hybrid Feature Engineering-based Approach for Propaganda Identification in social media