Self-Supervised Traffic Classification: Flow Embedding and Few-Shot Solutions

Eyal Horowicz,Tal Shapira,Yuval Shavitt
DOI: https://doi.org/10.1109/tnsm.2024.3366848
2024-01-01
IEEE Transactions on Network and Service Management
Abstract:Internet traffic classification has been intensively studied over the past decade due to its importance for traffic engineering and cyber security. A promising approach to several traffic classification problems is the FlowPic approach, where histograms of packet sizes in consecutive time slices are transformed into a picture that is fed into a Convolution Neural Network (CNN) model for classification. However, CNNs (and the FlowPic approach included) require a relatively large labeled flow dataset, which is not always easy to obtain. In this paper, we show that we can overcome this obstacle by using Contrastive Representation Learning in order to learn from an unlabeled flow dataset a flow representation that can be embedded in a latent space, enabling clustering of flows belonging to the same class together. We then show that by using just a few labeled flows (a few shots) from each class, we can achieve high accuracy in flow classification. We show that common picture augmentation techniques can help, but accuracy improves further when introducing augmentation techniques that mimic network behavior, such as changes in the RTT (Round-trip time). Finally, we show that we can replace the large FlowPics suggested in the past with much smaller mini-FlowPics and achieve two advantages: improved model performance and easier engineering. Interestingly, this even improves accuracy in some cases.
computer science, information systems
What problem does this paper attempt to address?