Abstract:Botnets are machines that are increasingly controlled by cybercriminals to perform various attacks. Traditional methods of defense, such as blocklisting, become ineffective because illegitimate domain names are sprung out by the domain generation algorithm (DGA) periodically and rapidly to maintain command and control (C&C) on servers. Deep learning and machine learning are candidate solutions to the problem. Deep learning methods leverage high accuracy but cost more time. Machine learning methods are qualified with high training speed in the context of frequent retraining to obtain high accuracy. However, the existing machine learning solutions cannot precisely capture the linguistic characteristics of domain names, which causes many false positives. For a comprehensive understanding of strings of domain names, we present the DOmain Linguistic PHonIcs detectioN (DOLPHIN) method, a novel method that can detect DGA-based botnets. Considering the context of detecting and the correspondence between pronunciations and spellings of words, we design DOLPHIN patterns. They are the classifications of variable-length vowels and consonants following the principles of phonics. Based on DOLPHIN patterns, a novel matching automation is used to reconstruct domain names with the components of variable-length vowels and consonants. From those domain names, DOLPHIN extracts phonics-based features. We implement DOLPHIN in supervised learning methods and compare them to the foremost methods FANCI, HAGDetector, and LSTM.MI. The experimental results show that, compared to FANCI with random forests, DOLPHIN can achieve a higher detection accuracy of 0.0265 with lower FPR and FNR without bringing much overhead. DOLPHIN is also able to generalize to other sources of data in the real world with the FPR decreasing by 0.0801 (62.97%) compared with FANCI. DOLPHIN can cooperate with most linguistic features and brings an improvement in performance compared to that of the existing linguistic feature-based methods.

Using Extended Character Feature in Bi-LSTM for DGA Domain Name Detection

Detection Method of Domain Names Generated by DGAs Based on Semantic Representation and Deep Neural Network

CNN-based DGA Detection with High Coverage

ReplaceDGA: BiLSTM-Based Adversarial DGA With High Anti-Detection Ability

DGA botnet detection method based on capsule network and k-means routing

Botnet DGA Domain Name Classification Using Transformer Network with Hybrid Embedding

Domain-Embeddings Based DGA Detection with Incremental Training Method

HAGDetector: Heterogeneous DGA domain name detection model

Detecting DGA-based botnets through effective phonics-based features

Detecting Domain Names Generated by DGAs With Low False Positives in Chinese Domain Names

Detecting DGA domains with recurrent neural networks and side information

Uit-DGAdetector: detect domains generated by algorithms using machine learning

Far from Classification Algorithm: Dive into the Preprocessing Stage in DGA Detection.

D3N: DGA Detection with Deep-Learning Through NXDomain

Real-Time Detection of Dictionary DGA Network Traffic using Deep Learning

GSAM: A Deep Neural Network Model for Extracting Computational Representations of Chinese Addresses Fused with Geospatial Feature

KDTM: Multi-Stage Knowledge Distillation Transfer Model for Long-Tailed DGA Detection

Fine-tuning Large Language Models for DGA and DNS Exfiltration Detection

LLMs for Domain Generation Algorithm Detection

Domain-Aware Attention FC yDomain Feature Extractor and Classifier Sentiment Feature Extractor and ClassifierDomain-Aware Query VectorBiLSTM BiLSTM Domain-Aware Word Embedding

DAG-based Long Short-Term Memory for Neural Word Segmentation