A Survey of Datasets for Information Diffusion Tasks

Fuxia Guo,Xiaowen Wang,Yanwei Xie,Zehao Wang,Jingqiu Li,Lanjun Wang
2024-07-07
Abstract:Information diffusion across various new media platforms gradually influences perceptions, decisions, and social behaviors of individual users. In communication studies, the famous Five W's of Communication model (5W Model) has displayed the process of information diffusion clearly. At present, although plenty of studies and corresponding datasets about information diffusion have emerged, a systematic categorization of tasks and an integration of datasets are still lacking. To address this gap, we survey a systematic taxonomy of information diffusion tasks and datasets based on the "5W Model" framework. We first categorize the information diffusion tasks into ten subtasks with definitions and datasets analysis, from three main tasks of information diffusion prediction, social bot detection, and misinformation detection. We also collect the publicly available dataset repository of information diffusion tasks with the available links and compare them based on six attributes affiliated to users and content: user information, social network, bot label, propagation content, propagation network, and veracity label. In addition, we discuss the limitations and future directions of current datasets and research topics to advance the future development of information diffusion. The dataset repository can be accessed at our website <a class="link-external link-https" href="https://github.com/fuxiaG/Information-Diffusion-Datasets" rel="external noopener nofollow">this https URL</a>.
Social and Information Networks,Information Retrieval
What problem does this paper attempt to address?
The problem this paper attempts to address is the lack of systematic classification and dataset integration in the task of information diffusion. Although there are many current studies and corresponding datasets on information diffusion, there is no systematic classification framework to organize these tasks and datasets. Therefore, the authors propose a systematic classification of information diffusion tasks based on the "5W model" (i.e., who, says what, through which channel, to whom, with what effect) and compile a collection of various publicly available datasets to meet the needs of each task. Specifically, the main contributions of the paper include: 1. **Task Classification**: Information diffusion tasks are divided into three main tasks—information diffusion prediction, social bot detection, and misinformation detection, and further subdivided into 10 sub-tasks, each with clear definitions and related datasets. 2. **Dataset Organization**: A large number of publicly available datasets are collected, and the sources and links of these datasets are provided. These datasets are compared based on six attributes (user information, social network, bot labels, diffusion content, diffusion network, and veracity labels). 3. **Limitations and Future Directions**: The paper discusses the shortcomings of current datasets and research topics and proposes suggestions for improvement to promote the future development of the information diffusion field. Through this work, the paper aims to provide researchers with a comprehensive reference framework to better understand and study the phenomenon of information diffusion.