LGB: Language Model and Graph Neural Network-Driven Social Bot Detection

Ming Zhou,Dan Zhang,Yuandong Wang,Yangli-ao Geng,Yuxiao Dong,Jie Tang
2024-06-14
Abstract:Malicious social bots achieve their malicious purposes by spreading misinformation and inciting social public opinion, seriously endangering social security, making their detection a critical concern. Recently, graph-based bot detection methods have achieved state-of-the-art (SOTA) performance. However, our research finds many isolated and poorly linked nodes in social networks, as shown in Fig.1, which graph-based methods cannot effectively detect. To address this problem, our research focuses on effectively utilizing node semantics and network structure to jointly detect sparsely linked nodes. Given the excellent performance of language models (LMs) in natural language understanding (NLU), we propose a novel social bot detection framework LGB, which consists of two main components: language model (LM) and graph neural network (GNN). Specifically, the social account information is first extracted into unified user textual sequences, which is then used to perform supervised fine-tuning (SFT) of the language model to improve its ability to understand social account semantics. Next, the semantically enriched node representation is fed into the pre-trained GNN to further enhance the node representation by aggregating information from neighbors. Finally, LGB fuses the information from both modalities to improve the detection performance of sparsely linked nodes. Extensive experiments on two real-world datasets demonstrate that LGB consistently outperforms state-of-the-art baseline models by up to 10.95%. LGB is already online: <a class="link-external link-https" href="https://botdetection.aminer.cn/robotmain" rel="external noopener nofollow">this https URL</a>.
Social and Information Networks,Computers and Society
What problem does this paper attempt to address?
The problem this paper attempts to address is the malicious social bots on social media that achieve their malicious purposes by spreading false information and inciting public opinion, posing a serious threat to social security. Although existing graph-based social bot detection methods have achieved state-of-the-art performance, they perform poorly when dealing with a large number of isolated or sparsely linked nodes in social networks. These nodes are difficult to detect effectively due to the lack of sufficient social relationship information. To solve this problem, the paper proposes a new social bot detection framework called LGB, which combines language models (LM) and graph neural networks (GNN) to improve the detection performance of isolated and sparsely linked nodes by jointly utilizing node semantics and network structure information. Specifically, the LGB framework first extracts user attributes, personal descriptions, and tweets from social accounts to form a unified user text sequence, and then performs supervised fine-tuning on the language model to enhance its ability to understand the semantics of social accounts. Next, the semantically enhanced node representations are input into a pre-trained graph neural network, further enhancing the node representations by aggregating neighbor information. Finally, LGB improves the detection performance of isolated and sparsely linked nodes by integrating information from both text semantics and network structure modalities. Experimental results show that LGB outperforms various state-of-the-art baseline models on two real datasets, with a maximum improvement of 10.95% in detection performance. Additionally, LGB has been deployed online and can be accessed through the provided URL.