A Statistical-Feature-based Approach to Internet Traffic Classification Using Machine Learning.

Shijun Huang,Kai Chen,Chao Liu,Alei Liang,Haibing Guan
DOI: https://doi.org/10.1109/icumt.2009.5345539
2009-01-01
Abstract:This Internet traffic classification using Machine Learning is an emerging research field since 1990's, and now it is widely used in numerous network activities. The classification technique focuses on modeling attributes and features of data flows to accomplish the identification of applications. In the paper we design and implement the classification model based on header-derived flow statistical features. Compared with the traditional methods, the model designed here, which is totally insensitive to port numbers and contents of payload on application level, overcomes difficulty in operation caused by unreliable port numbers and complexity of payload interpretation. Rather than relatively complex ML algorithms or even in mixture, supervised k-Nearest Neighbor estimator is adopted for the sake of computational efficiency, along with the effective and easy-to-calculate statistical features selected according to the operational background. Our results indicate that about 90% accuracy on per-flow classification can be achieved, which is a vast improvement over traditional techniques that achieve 50-70%.
What problem does this paper attempt to address?