AHDom: Algorithmically Generated Domain Detection Using Attribute Heterogeneous Graph Neural Network

Xiaoyan Hu,Di Li,Miao Li,Guang Cheng,Ruidong Li,Hua Wu
DOI: https://doi.org/10.1016/j.comnet.2024.110770
IF: 5.493
2024-01-01
Computer Networks
Abstract:Many cyber-attacks use Algorithmically Generated Domain (AGD) names to establish connections with command and control servers for subsequent attack behaviors. Identifying and blocking such AGDs helps detect and prevent attacks quickly. Traditional machine or deep learning detection methods rely only on individual domain features and face challenges in accurately distinguishing AGDs that attackers have crafted to evade detection. Thus, researchers leverage the inherent associated features among domains, clients, and resolved IP addresses to detect AGDs. In such research, heterogeneous graph neural networks are extensively employed. However, most existing methods rely on associated features, leading to inaccurate detection of isolated domain nodes. Besides, most existing detection methods employ transductive learning and are time-consuming. This paper proposes an AGD detection method, AHDom, to address these challenges. AHDom models DNS traffic as a Heterogeneous Information Network (HIN) to capture the intricate relationships between domains, clients, and resolved IP addresses. Besides, it extracts character and behavior features as initial attributes of domains to obtain an Attribute HIN (AHIN), enhancing the detection accuracy of isolated domain nodes. Based on the AHIN, it combines meta-path random walk, the inductive learning algorithm GraphSAGE, and the attention mechanism to obtain effective embedding representations of domain nodes. Ultimately, it achieves domain classification based on embedding representations of domain nodes. Our experimental results demonstrate that AHDom is superior to state-of-the-art methods in the performance and efficiency of detecting AGDs. AHDom achieves an average accuracy of 98.74% on our constructed dataset and costs only about 30.23% of the existing best graph neural network approach in the testing time.
What problem does this paper attempt to address?