Adaptive cost-sensitive stance classification model for rumor detection in social networks

Zahra Zojaji,Behrouz Tork Ladani
DOI: https://doi.org/10.1007/s13278-022-00952-2
2022-09-09
Social Network Analysis and Mining
Abstract:As online social networks are experiencing extreme popularity growth, determining the veracity of online statements denoted by rumors automatically as earliest as possible is essential to prevent the harmful effects of propagating misinformation. Early detection of rumors is facilitated by considering the wisdom of the crowd through analyzing different attitudes expressed toward a rumor (i.e., users' stances). Stance detection is an imbalanced problem as the querying and denying stances against a given rumor are significantly less than supportive and commenting stances. However, the success of stance-based rumor detection significantly depends on the efficient detection of "query" and "deny" classes. The imbalance problem has led the previous stance classifier models to bias toward the majority classes and ignore the minority ones. Consequently, the stance and subsequently rumor classifiers have been faced with the problem of low performance. This paper proposes a novel adaptive cost-sensitive loss function for learning imbalanced stance data using deep neural networks, which improves the performance of stance classifiers in rare classes. The proposed loss function is a cost-sensitive form of cross-entropy loss. In contrast to most of the existing cost-sensitive deep neural network models, the utilized cost matrix is not manually set but adaptively tuned during the learning process. Hence, the contributions of the proposed method are both in the formulation of the loss function and the algorithm for calculating adaptive costs. The experimental results of applying the proposed algorithm to stance classification of real Twitter and Reddit data demonstrate its capability in detecting rare classes while improving the overall performance. The proposed method improves the mean F -score of rare classes by about 13% in RumorEval 2017 dataset and about 20% in RumorEval 2019 dataset.
What problem does this paper attempt to address?