Crawling political communities in Twitter and extracting political affiliations

Muhammad Umer Gurchani
DOI: https://doi.org/10.48550/arXiv.2102.00849
2021-02-01
Abstract:In theory, a major advantage to the big data approach in studying online communities is that it should be possible to collect a representative random sample from a broadly defined population. However, in practice, data collection processes are not formalized, even for famous social media platforms such as Twitter and Facebook. As a result, there is ambiguity left on questions such as "how much data is enough?" and how representative are the samples of the broader population being studied in online social networks. In this paper, I propose a focused back-and-forth crawl approach and a validated seed choice method for collecting network-level data from Twitter. The proposed crawl method can extract community structures without needing a complete network graph for the Twitter network and validate its size using "reference score". It also takes care of the sampling size problem in Twitter by tracking the percentage of known nodes that have been included in the data. Thus, solving most major problems in Twitter data collection procedures and moving a step further to formalizing data collection methods for the platform. Once the communities are crawled, and the network graph is clean and complete; it is then possible to train Machine Learning classifiers using communities as features to predict political affiliations of users on a larger scale. As a case, I used the proposed method for separating French political communities on Twitter from the global Twitter community and knowing the political affiliations of users on a continuous scale.
Social and Information Networks
What problem does this paper attempt to address?