Predicting the 2020 US Presidential Election with Twitter

Michael Caballero
DOI: https://doi.org/10.5121/csit.2021.111006
2021-07-19
Abstract:One major sub-domain in the subject of polling public opinion with social media data is electoral prediction. Electoral prediction utilizing social media data potentially would significantly affect campaign strategies, complementing traditional polling methods and providing cheaper polling in real-time. First, this paper explores past successful methods from research for analysis and prediction of the 2020 US Presidential Election using Twitter data. Then, this research proposes a new method for electoral prediction which combines sentiment, from NLP on the text of tweets, and structural data with aggregate polling, a time series analysis, and a special focus on Twitter users critical to the election. Though this method performed worse than its baseline of polling predictions, it is inconclusive whether this is an accurate method for predicting elections due to scarcity of data. More research and more data are needed to accurately measure this method's overall effectiveness.
Social and Information Networks,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: whether the result of the 2020 US presidential election can be predicted by using Twitter data, and whether this prediction method can reach or exceed the accuracy of traditional polls. Specifically, the researchers explored the successful prediction methods in the past and proposed a new prediction method. This method combines sentiment analysis of tweet texts by natural language processing (NLP), structural data, aggregated polls, time - series analysis, and pays special attention to Twitter users who are crucial to the election. Although the performance of this method is not as good as its baseline poll prediction, due to data scarcity, it is still uncertain whether this is an effective election prediction method. The paper points out that more research and data are needed to accurately measure the overall effectiveness of this method.