Mining Hidden Interests from Twitter Based on Word Similarity and Social Relationship for OLAP
Dongjin Yu,Yiyu Wu,Jingchao Sun,Zhiyong Ni,Youhuizi Li,Qing Wu,Xufeng Chen
DOI: https://doi.org/10.1142/s0218194017400113
IF: 1.007
2017-01-01
International Journal of Software Engineering and Knowledge Engineering
Abstract:Online Analytical Processing, or OLAP, is an approach to answering multidimensional analytical (MDA) queries in an interactive way. However, the traditional OLAP approaches can only deal with structured data, but not unstructured textual data like tweets. To address this problem, we propose a Latent Dirichlet Allocation (LDA)-based model, called Multilayered Semantic LDA (MS-LDA), which detects the hidden layered interests from Twitter data based on LDA. The layered dimension of interests can be further used to apply OLAP techniques to Twitter data. Furthermore, MS-LDA employs the semantic similarity among words of tweets based on word2vec, and also the social relationship among twitters, to improve its effectiveness. The extensive experiments demonstrate that MS-LDA can effectively extract the dimension hierarchy of tweeters' interests for OLAP.