Predicting Microblog User's Age Based on Text Information

Ye Li,Tao Liu,Hongyan Liu,Jun He,Xiaoyong Du
DOI: https://doi.org/10.1007/978-3-642-41230-1_45
2013-01-01
Abstract:User age information plays a crucial role in many real applications such as precise marketing, directional promotion and personalized recommendation. In this paper, we focus on predicting user age range in Sina Weibo. To protect user privacy, we only have user basic profile information and user published messages (tweets), which are all mapped to integers. From these meaningless integers, we have to seek out underlying features or structures. Through analysis, we extract significant features related to age. In order to evaluate the correlation between user basic information and age ranges, we choose mutual information as measurement. To handle the problem of high dimensions and data sparsity caused by traditional word vector model of tweet contents, we propose aggregated tweet features corresponding to different age ranges. Using these features, we compared many classification algorithms. Finally, the model based on decision tree can achieve best prediction accuracy up to 83%.
What problem does this paper attempt to address?