Does Twitter know your political views? POLiTweets dataset and semi-automatic method for political leaning discovery

Joanna Baran,Michał Kajstura,Maciej Ziółkowski,Krzysztof Rajda
DOI: https://doi.org/10.48550/arXiv.2207.07586
2022-06-14
Abstract:Every day, the world is flooded by millions of messages and statements posted on Twitter or Facebook. Social media platforms try to protect users' personal data, but there still is a real risk of misuse, including elections manipulation. Did you know, that only 13 posts addressing important or controversial topics for society are enough to predict one's political affiliation with a 0.85 F1-score? To examine this phenomenon, we created a novel universal method of semi-automated political leaning discovery. It relies on a heuristical data annotation procedure, which was evaluated to achieve 0.95 agreement with human annotators (counted as an accuracy metric). We also present POLiTweets - the first publicly open Polish dataset for political affiliation discovery in a multi-party setup, consisting of over 147k tweets from almost 10k Polish-writing users annotated heuristically and almost 40k tweets from 166 users annotated manually as a test set. We used our data to study the aspects of domain shift in the context of topics and the type of content writers - ordinary citizens vs. professional politicians.
Computation and Language,Machine Learning,Social and Information Networks
What problem does this paper attempt to address?