Semi-supervised Stance-Topic Model for Stance Classification on Social Media.
Kang Xu,Sheng Bi,Guilin Qi
DOI: https://doi.org/10.1007/978-3-319-70682-5_13
2017-01-01
Abstract:Stance detection aims to automatically determine from text whether the author of the text is in favor of, against, or neutral towards a issue. Social media, such as Sina Weibo, reflects the general public's stances towards different issues. Detecting and summarizing stances towards specific issues from social media is an important and challenging task. Although stance detection on social media has been studied before, previous work, most of which are based on supervised learning, may not work well because they suffer from its heavy dependence on training data. Other weakly supervised method also use some heuristic rules to select the posts with specific stances as training data, but these selected posts often concentrate on a few subtopics of the specific issue, these weakly supervised method can only train a biased stance classifier. To better detect stances toward specific issues, we consider to detect stances with a small number of labeled training data and a mass of unlabeled data. To integrate the supervised information into our model, we combine a discriminative maximum entropy (Max-Ent) component with the generative component. The Max-Ent component leverages hand-crafted features from labeled data to separate different stances. In this paper, we propose a semi-supervised topic model, Semi-Supervised Stance Topic Model (SSTM), that model stances and topics of the posts on social media. Since the posts on social media are short texts, we also incorporate the structural information of the posts, i.e., gender information, location information and time information, to aggregate posts for alleviating the context sparsity of the posts. The model has been evaluated on the selected posts on sina weibo, which talk about "the verbal battle of Han han and Fang zhouzi", to classify the stance of each posts. Preliminary experiments have shown promising results achieved by SSTM. Moreover, we also analyze the common difficulties in stance detection on social media. Finally, we also visualize the subtopics of the given issue generated by SSTM.