Toward Early and Order-of-magnitude Cascade Prediction in Social Networks

Ruocheng Guo,Elham Shaabani,Abhinav Bhatnagar,Paulo Shakarian
DOI: https://doi.org/10.1007/s13278-016-0372-7
2016-01-01
Social Network Analysis and Mining
Abstract:When a piece of information (microblog, photograph, video, link, etc.) starts to spread in a social network, an important question arises: will it spread to "viral" proportions-where "viral" can be defined as an order-of-magnitude increase. However, several previous studies have established that cascade size and frequency are related through a power law-which leads to a severe imbalance in this classification problem. In this paper, we devise a suite of measurements based on "structural diversity"-the variety of social contexts (communities) in which individuals partaking in a given cascade engage. We demonstrate these measures are able to distinguish viral from nonviral cascades, despite the severe imbalance of the data for this problem. Further, we leverage these measurements as features in a classification approach, successfully predicting microblogs that grow from 50 to 500 reposts with precision of 0.69 and recall of 0.52 for the viral class-despite this class comprising under 2 % of samples. This significantly outperforms our baseline approach as well as the current state of the art. We also show this approach also performs well for identifying whether cascades observed for 60 min will grow to 500 reposts as well as demonstrate how we can trade-off between precision and recall.
What problem does this paper attempt to address?