Analysis of Domain Name Queries Based on the K-Means Algorithm

JI Cheng,LI Xiaodong,YUAN Jian,YUCHI Xuebiao,SHAN Xiuming
DOI: https://doi.org/10.16511/j.cnki.qhdxxb.2010.04.033
2010-01-01
Abstract:A full day's queries looking up the IP address associated with the CN domain names were investigated to study the Internet access pattern. The queries were collected from the authoritative CN name servers running by the China Internet Network Information Center. A data compression method was designed,which reduces the volume of data while retaining the valid information about users' visiting website. The feature vector of IPs and domain names' temporal behavior were clustering with the k-means algorithm. The results show that according to the differences between the temporal behaviors,IP addresses are divided into three main clusters,attackers,the main ISP's recursive server,and other recursive servers and that domain names are divided into four main clusters. The further clustering of the domain names queried by large number of users finds the domain names truly reflecting the need of the majority of the users.
What problem does this paper attempt to address?