An Adaptive Classification Approach Based on Information Entropy for Network Traffic in Presence of Concept Drift
Wu-Bin PAN,Guang CHENG,Xiao-Jun GUO,Shun-Xiang HUANG
DOI: https://doi.org/10.11897/SP.J.1016.2017.01556
2017-01-01
Chinese Journal of Computers
Abstract:In recent years, traffic classification based on machine learning shows a high accuracy.Nevertheless, machine learning-based traffic classification heavily depends on the environment where the samples are trained.In practice, although a classifier can be accurately trained at a given network environment, its accuracy will see a great decline when it faces to classify traffic from varying network condition in practice.Due to dynamic changes of traffic statistics and distribution, the machine learning-based classifiers should be updated periodically in order to optimize the performance.This issue is unavoidable for machine learning-based traffic classification.The present solutions lack explicit recommendations on when a classifier should be updated and how to effectively update the classifier.These result in several shortcomings: (1) Updating a traditional traffic classifier is time consuming.It is inherent to how often a classifier should be updated or when a new classifier will be needed.(2) Updating only a new classifier on new traffic leads to some learned knowledge lost.It further affects the performance when updating a classifier on a large dataset that combines all collected data.(3) Traffic statistics and distribution from varying network condition are dynamically changed.Thus, it is hard to obtain stable feature subset to build robust classifier.Therefore, building an adaptive classifier to changing network condition is a huge challenge.In this paper, we develop an adaptive traffic classification using entropy-based detection and incremental ensemble learning, assisted with embedded feature selection.In order to update the classifier timely and effectively, the entropy-based detection utilizes sliding window technique to measure the statistical difference between the previous and current traffic samples by counting and comparing all instances with respect to their feature stream membership.Additionally, we discretize the range of feature values to a fixed number of bins to take the approximate value distribution into account.Moreover, incremental ensemble learning schema retains previous trained classifiers, and introduces the classifier retrained on current traffic and removes the classifier with performance degradation.Furthermore, several feature selectors are integrated to obtain feature subsets with robust generalization.The comprehensive performance evaluation conducted on two real-world network traffic data sets shows that our approach can effectively detect concept drift in changing network condition and update the classifier with high accuracy and generalization ability.The major contributions of this work are summarized as follows: first, this paper presents an adaptive traffic classification system based on concept drift detection.Information entropy is used to detect concept drift based on the entropy change of feature attributes.The information entropy-based detection method does not require class information of flows.Second, the classifiers are updated according to the result of concept drift detection, rather than regularly updated at a given period.Third, the method uses ensemble learning strategy to introduce classifier built on new samples, and eliminates classifiers with performance degradation in order to optimize the classification model.Fourth, mutual information is introduced to evaluate features for concept drift detection.The results show that the mutual information between packet size and protocol is high and stable, which indicates that the feature is suitable for concept drift detection.Fifth, this paper uses Hoeffding boundary to determine the window size of concept drift detection.The appropriate window size is significant for fast and effective concept drift detection.