Clustering Network Traffic Using Semi-Supervised Learning

Antonina Krajewska,Ewa Niewiadomska-Szynkiewicz
DOI: https://doi.org/10.3390/electronics13142769
IF: 2.9
2024-07-15
Electronics
Abstract:Clustering algorithms play a crucial role in early warning cybersecurity systems. They allow for the detection of new attack patterns and anomalies and enhance system performance. This paper discusses the problem of clustering data collected by a distributed system of network honeypots. In the proposed approach, when a network flow matches an attack signature, an appropriate label is assigned to it. This enables the use of semi-supervised learning algorithms and improves the quality of clustering results. The article compares the results of learning algorithms conducted with and without partial supervision, particularly non-negative matrix factorization and semi-supervised non-negative matrix factorization. Our results confirm the positive impact of labeling a portion of flows on the quality of clustering.
engineering, electrical & electronic,physics, applied,computer science, information systems
What problem does this paper attempt to address?