A study on the application of the T5 large language model in encrypted traffic classification

Jian Luo,Zechao Chen,Wenxiong Chen,Huali Lu,Feng Lyu
DOI: https://doi.org/10.1007/s12083-024-01817-5
IF: 3.488
2024-11-28
Peer-to-Peer Networking and Applications
Abstract:In the era of mobile Internet, the widespread use of VPNs increases the demand for data security and privacy but also poses challenges for ISPs in terms of quality of service and traffic monitoring. The research in this paper focuses on how to accurately classify encrypted traffic. Traditional methods usually require manual labeling of features, which suffers from high cost and unstable accuracy. Due to the special characteristics of encrypted traffic, traditional labeling methods cannot be well adapted, so new solutions are urgently needed. In this paper, a generative learning method based on large-scale language models is adopted, which fuses encrypted traffic features into the T5 language model. The fine-tune T5 model conducts transfer learning with a small amount of data and achieve good classification accuracy. Compared with the traditional methods, the model performs better in terms of classification effectiveness. It can effectively classify encrypted traffic even with a small number of samples, and distinguish between VPN and non-VPN traffic. Test results on the ISCX VPN-nonVPN dataset show that the new generative classifier improves the F1 score to 98.5%, which is a 5.5% improvement compared to the previous one. The experiments show that the method is effective and efficient.
computer science, information systems,telecommunications
What problem does this paper attempt to address?