IsamasRed: A Public Dataset Tracking Reddit Discussions on Israel-Hamas Conflict

Kai Chen,Zihao He,Keith Burghardt,Jingxin Zhang,Kristina Lerman
2024-04-17
Abstract:The conflict between Israel and Palestinians significantly escalated after the October 7, 2023 Hamas attack, capturing global attention. To understand the public discourse on this conflict, we present a meticulously compiled dataset-IsamasRed-comprising nearly 400,000 conversations and over 8 million comments from Reddit, spanning from August 2023 to November 2023. We introduce an innovative keyword extraction framework leveraging a large language model to effectively identify pertinent keywords, ensuring a comprehensive data collection. Our initial analysis on the dataset, examining topics, controversy, emotional and moral language trends over time, highlights the emotionally charged and complex nature of the discourse. This dataset aims to enrich the understanding of online discussions, shedding light on the complex interplay between ideology, sentiment, and community engagement in digital spaces.
Social and Information Networks,Computers and Society,Digital Libraries
What problem does this paper attempt to address?
The paper primarily addresses the following issues: 1. **Constructing a Comprehensive Dataset**: The paper introduces a systematic approach to collecting discussion data related to the 2023 Israel-Hamas conflict from the social media platform Reddit. Through this method, researchers can obtain a broad perspective and narrative about the conflict from a large amount of user-generated content. 2. **Automated Keyword Extraction Framework**: To improve the efficiency and accuracy of data collection, the paper proposes an automated keyword extraction framework based on large language models (LLM). This framework can effectively identify keywords related to the Israel-Hamas conflict, ensuring that the collected data is comprehensive and targeted. 3. **Analyzing Public Opinion Formation Mechanisms**: By conducting an in-depth analysis of the collected data, the paper aims to explore the trends in public attitudes, emotions, and moral dimensions regarding the Israel-Hamas conflict. Specifically, the research focuses on aspects such as user engagement, controversy, emotions, and moral foundations to reveal the complex dynamics in online discussions. 4. **Creating Specific Subtopic Data Subsets**: To study two key subtopics associated with the conflict in more detail—Zionism/Antisemitism and Free Palestine/Islamophobia—the paper also creates two sub-datasets, namely IsamasRed-Z and IsamasRed-P. 5. **Statistical and Semantic Analysis**: The paper conducts preliminary statistical analysis on the data, including changes in the number of submissions and comments over time, trends in controversy, and activity levels in different subforums. Additionally, advanced natural language processing techniques are used to analyze the moral foundations and emotional expressions in the comments. In summary, the paper aims to deepen our understanding of public discussions on social media about the Israel-Hamas conflict by establishing a comprehensive dataset and adopting innovative methodologies. It also seeks to explore the complex ideologies, emotions, and social interaction mechanisms behind these discussions.