Unsupervised Clustering for a Comparative Methodology of Machine Learning Models to Detect Domain-Generated Algorithms Based on an Alphanumeric Features Analysis

Mohamed Hassaoui,Mohamed Hanini,Said El Kafhali
DOI: https://doi.org/10.1007/s10922-023-09793-6
2024-01-03
Journal of Network and Systems Management
Abstract:Domain Generation Algorithms (DGAs) are often used for generating huge amounts of domain names to maintain command and control between the infected computer and the bot master. By establishing as needed a great number of domain names, attackers may mask their C2 servers and escape detection. Many malware families have switched to a stealthier contact approach. Therefore, the traditional methods become ineffective. Over the past decades, many researches have started to use artificial intelligence to create systems able to detect DGA in traffic, but these works do not use the same data to evaluate their models. This article proposes a comparative methodology to compare machine learning models based on unsupervised clustering and then applied this methodology to study the best models belonging to neural network methods and traditional machine learning methods to detect DGAs. We extracted 21 linguistic features based on the analysis of alphanumeric and n-gram, we studied the correlation between these features in order to reduce their number. We examine in detail those Machine learning algorithms and we discuss the drawbacks and strengths of each method with specific classes of DGA to propose a new switch case model that could be always reliable to detect DGAs.
computer science, information systems,telecommunications
What problem does this paper attempt to address?