ConspEmoLLM: Conspiracy Theory Detection Using an Emotion-Based Large Language Model

Zhiwei Liu,Boyang Liu,Paul Thompson,Kailai Yang,Sophia Ananiadou
2024-08-12
Abstract:The internet has brought both benefits and harms to society. A prime example of the latter is misinformation, including conspiracy theories, which flood the web. Recent advances in natural language processing, particularly the emergence of large language models (LLMs), have improved the prospects of accurate misinformation detection. However, most LLM-based approaches to conspiracy theory detection focus only on binary classification and fail to account for the important relationship between misinformation and affective features (i.e., sentiment and emotions). Driven by a comprehensive analysis of conspiracy text that reveals its distinctive affective features, we propose ConspEmoLLM, the first open-source LLM that integrates affective information and is able to perform diverse tasks relating to conspiracy theories. These tasks include not only conspiracy theory detection, but also classification of theory type and detection of related discussion (e.g., opinions towards theories). ConspEmoLLM is fine-tuned based on an emotion-oriented LLM using our novel ConDID dataset, which includes five tasks to support LLM instruction tuning and evaluation. We demonstrate that when applied to these tasks, ConspEmoLLM largely outperforms several open-source general domain LLMs and ChatGPT, as well as an LLM that has been fine-tuned using ConDID, but which does not use affective features. This project will be released on <a class="link-external link-https" href="https://github.com/lzw108/ConspEmoLLM/" rel="external noopener nofollow">this https URL</a>.
Computation and Language
What problem does this paper attempt to address?
The main aim of this paper is to address the following issues: 1. **Improving the accuracy of conspiracy theory detection**: Most current methods based on large language models (LLMs) focus only on binary classification tasks (i.e., whether the text contains conspiracy theories) and overlook the close connection between conspiracy theories and emotional features. Therefore, the researchers aim to improve the performance of automatic conspiracy theory detection by utilizing emotional information. 2. **Developing a multi-task conspiracy theory detection dataset**: To promote the application and development of LLMs in the field of conspiracy theory detection, the research team has constructed a multi-task instruction-tuning dataset named ConDID. This dataset not only includes the identification of conspiracy theories but also involves tasks such as conspiracy theory type classification and related discussion detection. 3. **Proposing an LLM that integrates emotional information**: The researchers have proposed a new open-source LLM called ConspEmoLLM, which is specifically designed for various conspiracy theory detection tasks and can utilize emotional information for deeper analysis and more accurate detection. Specifically, the paper addresses the following aspects: - **Research background**: With the development of the internet and social media, the spread of misinformation, especially conspiracy theories, has accelerated, negatively impacting society. Therefore, efficient methods are needed to automatically detect such content. - **Importance of emotional features**: Previous studies have shown a close association between misinformation (including conspiracy theories) and emotional features. Hence, this study further explores the application value of emotional information (such as mood and emotions) in understanding conspiracy theories. - **Technical challenges**: Traditional pre-trained language models (such as BERT, RoBERTa) are limited by their smaller parameter sizes when handling complex tasks. Although recently developed LLMs perform well, they still have shortcomings in utilizing emotional features for detailed analysis. In summary, the main goal of this paper is to improve conspiracy theory detection methods by introducing emotional analysis and constructing specialized datasets, thereby enhancing the ability to automatically identify and classify conspiracy theory texts.