Measuring political sentiment on Twitter: factor-optimal design for multinomial inverse regression

Matt Taddy
DOI: https://doi.org/10.48550/arXiv.1206.3776
2013-03-02
Abstract:This article presents a short case study in text analysis: the scoring of Twitter posts for positive, negative, or neutral sentiment directed towards particular US politicians. The study requires selection of a sub-sample of representative posts for sentiment scoring, a common and costly aspect of sentiment mining. As a general contribution, our application is preceded by a proposed algorithm for maximizing sampling efficiency. In particular, we outline and illustrate greedy selection of documents to build designs that are D-optimal in a topic-factor decomposition of the original text. The strategy is applied to our motivating dataset of political posts, and we outline a new technique for predicting both generic and subject-specific document sentiment through use of variable interactions in multinomial inverse regression. Results are presented for analysis of 2.1 million Twitter posts around February 2012.
Applications
What problem does this paper attempt to address?
The main issues this paper attempts to address are two key aspects of text data analysis: selecting documents for costly sentiment annotation and utilizing inverse regression methods for sentiment prediction. Specifically: 1. **Selecting Documents for Sentiment Annotation**: When performing sentiment analysis on posts from Twitter, it is necessary to select a representative subsample from a large number of posts for sentiment scoring. This step is usually time-consuming and expensive. Therefore, the paper proposes an algorithm to optimize this process by selecting the most representative documents for sentiment annotation. 2. **Sentiment Prediction**: The paper also proposes a new technique for document sentiment prediction by using variable interactions in Multinomial Inverse Regression (MNIR). This method can predict not only general sentiment tendencies but also sentiment tendencies on specific topics, such as sentiment towards specific political figures. Through the above methods, the paper aims to improve the efficiency and accuracy of sentiment analysis, especially when dealing with large-scale text datasets. Additionally, the paper applies these methods to analyze approximately 2.1 million Twitter posts collected during February 2012 to evaluate the sentiment tendencies towards American politicians.