Detection of Algorithmically Generated Domain Names Using SMOTE and Hybrid Neural Network.

Yudong Zhang,Yuzhong Chen,Yangyang Lin,Yankun Zhang
DOI: https://doi.org/10.1007/978-981-15-1377-0_57
2019-01-01
Abstract:Domain generation algorithms (DGA) provide methods that use specific parameters as random seeds to generate a large number of random domain names for preventing malicious domain name detection, which greatly increases the difficulty of detecting and defending botnets and malware. State-of-the-art models for detecting algorithmically generated domain names are generally based on the principle of analyzing the statistical characteristics of the domain name and building a classifier to locate the algorithmically generated ones. However, most current models have problems of requiring the manual construction of feature sets for classification, as they are sensitive to the imbalance of the sample distribution in the domain name dataset and are difficult to adapt to frequent changes of the domain name algorithm. To address this issue, we propose a hybrid model that combines a convolutional neural network (CNN) and a bidirectional long-term memory network (BLSTM). First, to solve the problem of the number of domain names generated by DGAs being relatively small and the sample distribution being unbalanced, which consequently decreases detection accuracy, the borderline synthetic minority over sampling technique is employed to optimize the sample balance of the domain name dataset. Second, a hybrid deep neural network that combines CNN and BLSTM is introduced to extract the semantic and context-dependency features from the domain names. The experimental results from different domain-name datasets demonstrate that the proposed model achieves significant improvement over state-of-the-art models with regard to precision and robustness.
What problem does this paper attempt to address?