The Impact of Data Persistence Bias on Social Media Studies

Tuğrulcan Elmas

DOI: https://doi.org/10.1145/3578503.3583630

2023-03-02

Abstract:Social media studies often collect data retrospectively to analyze public opinion. Social media data may decay over time and such decay may prevent the collection of the complete dataset. As a result, the collected dataset may differ from the complete dataset and the study may suffer from data persistence bias. Past research suggests that the datasets collected retrospectively are largely representative of the original dataset in terms of textual content. However, no study analyzed the impact of data persistence bias on social media studies such as those focusing on controversial topics. In this study, we analyze the data persistence and the bias it introduces on the datasets of three types: controversial topics, trending topics, and framing of issues. We report which topics are more likely to suffer from data persistence among these datasets. We quantify the data persistence bias using the change in political orientation, the presence of potentially harmful content and topics as measures. We found that controversial datasets are more likely to suffer from data persistence and they lean towards the political left upon recollection. The turnout of the data that contain potentially harmful content is significantly lower on non-controversial datasets. Overall, we found that the topics promoted by right-aligned users are more likely to suffer from data persistence. Account suspensions are the primary factor contributing to data removals, if not the only one. Our results emphasize the importance of accounting for the data persistence bias by collecting the data in real time when the dataset employed is vulnerable to data persistence bias.

Social and Information Networks,Computers and Society

What problem does this paper attempt to address?

The paper primarily explores the impact of data persistence bias on social media research, particularly the issues that arise when retrospectively collecting data. The authors point out that over time, some data on social media may be deleted or become inaccessible, leading to discrepancies between the datasets used for analysis and the original datasets, which may introduce bias. The core contributions of the paper include: 1. **Quantifying Data Persistence Bias**: The authors are the first to quantify data persistence bias and analyze how this bias affects different types of topics in social media research, especially controversial topics, trending topics, and issue framing. 2. **Analyzing the Impact of Bias**: By analyzing factors such as changes in political leanings, the presence of potentially harmful content, and the sources of topics, the paper evaluates the specific impact of data persistence bias on social media research. 3. **Case Studies**: The paper discusses three case studies, including controversial topics, trending topics, and issue framing in the immigration debate, to understand which topics are more susceptible to data persistence bias. 4. **Factor Analysis**: The study also explores the factors leading to data deletion, finding that account bans are the main cause of data loss, while user-initiated content deletion is also a significant factor. In summary, the paper aims to reveal the issues caused by data persistence bias in social media research and emphasizes the importance of real-time data collection to reduce the impact of such bias. Additionally, the authors provide practical recommendations, such as encouraging data sharing, to improve the reliability and reproducibility of research.

The Impact of Data Persistence Bias on Social Media Studies

Sampled Datasets Risk Substantial Bias in the Identification of Political Polarization on Social Media

Quantifying participation biases on social media

Data and Model Biases in Social Media Analyses: A Case Study of COVID-19 Tweets.

Trends in Social Media : Persistence and Decay

Big Questions for Social Media Big Data: Representativeness, Validity and Other Methodological Pitfalls

Dynamics of Ideological Biases of Social Media Users

Information Retention in the Multi-platform Sharing of Science

Context matters in social media

How Does Multi-Platform Social Media Use Lead to Biased News Engagement? Examining the Role of Counter-Attitudinal Incidental Exposure, Cognitive Elaboration, and Network Homogeneity

Modeling Political Orientation of Social Media Posts: An Extended Analysis

Impact of social Media usage on technostress and cyber incivility

A Longitudinal Test of Political Self-Effects on Social Media

Persistent interaction patterns across social media platforms and over time

BlackLivesMatter 2020: An Analysis of Deleted and Suspended Users in Twitter

Enabling News Consumers to View and Understand Biased News Coverage: A Study on the Perception and Visualization of Media Bias

Mass media impact on opinion evolution in biased digital environments: a bounded confidence model

A Biased Review of Biases in Twitter Studies on Political Collective Action

Investigating Influential Users' Responses to Permanent Suspension on Social Media

Incidental Data: A Survey towards Awareness on Privacy-Compromising Data Incidentally Shared on Social Media