Span identification and technique classification of propaganda in news articles

Wei Li,Shiqian Li,Chenhao Liu,Longfei Lu,Ziyu Shi,Shiping Wen
DOI: https://doi.org/10.1007/s40747-021-00393-y
2021-05-08
Abstract:Abstract Propaganda is a rhetorical technique designed to serve a specific topic, which is often used purposefully in news article to achieve our intended purpose because of its specific psychological effect. Therefore, it is significant to be clear where and what propaganda techniques are used in the news for people to understand its theme efficiently during our daily lives. Recently, some relevant researches are proposed for propaganda detection but unsatisfactorily. As a result, detection of propaganda techniques in news articles is badly in need of research. In this paper, we are going to introduce our systems for detection of propaganda techniques in news articles, which is split into two tasks, Span Identification and Technique Classification. For these two tasks, we design a system based on the popular pretrained BERT model, respectively. Furthermore, we adopt the over-sampling and EDA strategies, propose a sentence-level feature concatenating method in our systems. Experiments on the dataset of about 550 news articles offered by SEMEVAL show that our systems perform state-of-the-art.
computer science, artificial intelligence
What problem does this paper attempt to address?
The paper attempts to address the problem of detecting propaganda techniques in news articles. Specifically, the authors focus on two tasks: 1. **Span Identification**: Determining which parts of the news articles contain propaganda techniques. 2. **Technique Classification**: Identifying the specific propaganda techniques used in these parts. ### Background and Motivation Propaganda techniques are a form of rhetoric often intentionally used in news articles to achieve specific purposes because they have particular psychological effects. Therefore, it is very important for people to clearly understand which propaganda techniques are used and where they are located in news articles to efficiently grasp the article's theme in daily life. However, existing research on propaganda detection is not satisfactory, thus further research is needed to improve detection effectiveness. ### Main Contributions 1. **System Design**: The authors designed two systems based on the pre-trained BERT model, one for span identification and the other for technique classification. 2. **Three-class Method**: The binary span identification task was extended to a three-class task by adding an "invalid" label type to reduce noise and improve accuracy. 3. **Sentence-level Feature Concatenation (SLFC)**: Introduced the method of sentence-level feature concatenation in the span identification system, integrating sentence-level classification features into each word for the first time. 4. **Data Augmentation and Oversampling**: Employed oversampling and EDA (data augmentation) strategies to optimize the dataset, enhancing the model's generalization ability and robustness. ### Experimental Results - **Span Identification Task (SI)**: Through oversampling, EDA, and sentence-level feature concatenation methods, the authors' system achieved an F1 score of 44.1732% on the test set, significantly outperforming the baseline model and other methods. - **Technique Classification Task (TC)**: By using the EDA strategy, the authors' system achieved an F1 score of 57.5729% on the test set, an improvement of about 3% compared to the model without EDA. ### Conclusion and Future Work The authors proposed two specific systems based on the BERT model for span identification and technique classification in news articles. Experimental validation showed that these systems performed excellently in the propaganda detection task and further improved the BERT model. Future work directions include adopting methods from the SpanBERT model, exploring more suitable BERT architectures, and compressing the model to enhance its applicability and convenience.