Network Traffic Classification Based on Semi-Supervised Clustering

Guan-zhou LIN,Yang XIN,Xin-xin NIU,Hui-bai JIANG
DOI: https://doi.org/10.1016/s1005-8885(09)60577-x
2010-01-01
Abstract:The diminished accuracy of port-based and payload-based classification motivates use of transport layer statistics for network traffic classification. A semi-supervised clustering approach based on improved K-Means clustering algorithm is proposed in this paper to partition a training network flows set that contains a huge number of unlabeled flows and scarce labeled flows. The variance of flow attributes is used to initialize clusters centers instead of the random selection of the cluster centers in initialization. Scarce labeled flows are selected to construct a mapping from the clusters to the predefined traffic classes set. The experimental results show that both the overall accuracy and square error (SSE) value of our algorithm present better than those based on normal K-Means algorithm defined in Ref. [5].
What problem does this paper attempt to address?