CARD-B: A stacked ensemble learning technique for classification of encrypted network traffic

ThankGod Obasi,M. Omair Shafiq
DOI: https://doi.org/10.1016/j.comcom.2022.02.006
IF: 5.047
2022-02-01
Computer Communications
Abstract:Classification of network traffic data into different applications, services, or types is critical for network service providers to monitor networks and maintain Quality of Service (QoS). With the continuous evolution of technological advancements and with the rapid increase in security and user privacy concerns, encryption techniques are commonly used. Encrypted network traffic makes it challenging for network service providers to monitor a network using classical network monitoring techniques and tools. Due to security and privacy reasons, data cannot be decrypted. The dynamics of encrypted network traffic data cannot be interpreted, and it makes the task of classifying the encrypted network traffic a major challenge. This presents Internet Service Providers with challenges as varying Quality of Service is being provided to clients along with the security and privacy concerns to monitor network traffic. Machine learning and deep learning techniques are being utilized to classify encrypted network traffic data. This paper presents an ensemble learning technique that is based on existing data pre-processing machine learning and deep learning techniques. We examine different models and identify the best and relevant statistical features from encrypted network traffic for the classification of non-VPN encrypted network traffic data. We performed multiple experiments that led us to developing an ensemble learning model based on the existing deep learning and machine learning models for the classification of non-VPN encrypted network traffic data. The proposed solution (named as CARD-B) is composed of Capsule Neural Networks, Artificial Neural Networks, Random Forest, Decision Trees, along with Boosting techniques such as Adaptive Boosting and Extreme Gradient Boosting. The techniques are stacked using Random Forest Classifier. The result of the experiment shows that the proposed model achieved an overall accuracy of 96% and an AUC of 98% using different possible extracted statistical features.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?