Birdspotter: A Tool for Analyzing and Labeling Twitter Users

Rohit Ram,Quyu Kong,Marian-Andrei Rizoiu
DOI: https://doi.org/10.1145/3437963.3441695
2021-02-23
Abstract:The impact of online social media on societal events and institutions is profound; and with the rapid increases in user uptake, we are just starting to understand its ramifications. Social scientists and practitioners who model online discourse as a proxy for real-world behavior, often curate large social media datasets. A lack of available tooling aimed at non-data science experts frequently leaves this data (and the insights it holds) underutilized. Here, we propose birdspotter -- a tool to analyze and label Twitter users --, and <a class="link-external link-http" href="http://birdspotter.ml" rel="external noopener nofollow">this http URL</a> -- an exploratory visualizer for the computed metrics. birdspotter provides an end-to-end analysis pipeline, from the processing of pre-collected Twitter data, to general-purpose labeling of users, and estimating their social influence, within a few lines of code. The package features tutorials and detailed documentation. We also illustrate how to train birdspotter into a fully-fledged bot detector that achieves better than state-of-the-art performances without making any Twitter API online calls, and we showcase its usage in an exploratory analysis of a topical COVID-19 dataset.
Computers and Society,Social and Information Networks
What problem does this paper attempt to address?
This paper attempts to solve the following three main problems: 1. **Lack of user analysis tools for non - data science experts**: - Most of the existing tools are for brand management and analyzing individual users or organizations, and no tool can perform retrospective analysis and annotation on all users in the collected Twitter data. Therefore, for non - data science experts (such as social scientists, journalists, etc.), these data and their potential insights are often underestimated. - The paper proposes a tool named birdspotter, which aims to process, describe, and perform general annotation on existing Twitter datasets with just a few lines of code, thus filling this gap. 2. **Quantifying the botness and influence of users in existing datasets**: - The existing state - of - the - art bot - detection tools (such as Botometer) rely on online API calls and cannot generate prediction results for deleted or suspended accounts. This results in that after a period of data analysis, many bots participating in the discussion may have been banned by Twitter, and their botness cannot be accurately measured. - Similarly, the existing influence - assessment tools usually require social graph information, which is very difficult to obtain retrospectively. - The paper trains birdspotter using four publicly - labeled Twitter bot datasets, enabling it to detect bots without relying on online APIs, and shows that its performance is better than the current state - of - the - art Botometer. In addition, it also implements an influence - estimation method based on the diffusion model without requiring additional social graph information. 3. **Visualizing and exploring the breadth and depth of Twitter users and their activities**: - Currently, there is a lack of tools that can provide both an overall view and a detailed view to help researchers better understand the activity patterns of Twitter users. - The paper proposes birdspotter.ml, a visualization tool that helps analyze the online discussions in which Twitter users participate, providing an overall view of user groups and a detailed view of individual user activities. ### Formula presentation - **Event intensity function of Hawkes process**: \[ \lambda(t | H(T))=\mu(t)+\sum_{t_{i}<t} \phi(t - t_{i}) \] where: - \(\mu(t)\) is the background intensity function, - \(\phi: \mathbb{R}^{+}\to\mathbb{R}^{+}\) is a kernel function that captures the attenuation effect of historical events. - **Exponential kernel function**: \[ \phi_{\text{EXP}}(t)=\kappa \theta e^{-\theta t} \] - **Power - law kernel function**: \[ \phi_{\text{PL}}(t)=\frac{\kappa}{(t + c)^{1+\theta}} \] - **Probability calculation of marked Hawkes process**: \[ p_{ij}=\frac{\varphi(m_{i}, t_{j}-t_{i})}{\sum_{k = 1}^{j-1} \varphi(m_{k}, t_{j}-t_{k})} \] where: - \(m_{j}\) is the number of followers of the associated user, - \(t_{j}\) is the time of the event, - \(\varphi(m, \Delta t)=\kappa \theta m^{\beta} e^{-\theta \Delta t}\) is the marked Hawkes exponential kernel function. Through these formulas and methods, birdspotter can effectively analyze and annotate users in existing Twitter datasets without the need for additional API calls.