Illegal Domain Name Generation Algorithm Based on Character Similarity of Domain Name Structure

Yuchen Liang,Yanan Cheng,Zhaoxin Zhang,Tingting Chai,Chao Li
DOI: https://doi.org/10.3390/app13064061
2023-01-01
Applied Sciences
Abstract:Detecting and controlling illegal websites (gambling and pornography sites) through illegal domain names has been an unsolved problem. Therefore, how to mine and discover potential illegal domain names in advance has become a current research hotspot. This paper studies a method of generating illegal domain names based on the character similarity of domain name structure. Firstly, the K-means algorithm classified illegal domain names with similar structures. Then, put the classified clusters into the adversarial generative network for training. Finally, through a specific result verification method, the experiment shows that the average concentration of the generation algorithm is 23.82%, the effective concentration is 63.54%, and the expansion rate is 7.5. By comparing the results with the enumeration algorithm, the generation algorithm has greatly improved in terms of generation efficiency and accuracy.
What problem does this paper attempt to address?