Using Graph Theory for Improving Machine Learning-based Detection of Cyber Attacks

Giacomo Zonneveld,Lorenzo Principi,Marco Baldi
2024-02-13
Abstract:Early detection of network intrusions and cyber threats is one of the main pillars of cybersecurity. One of the most effective approaches for this purpose is to analyze network traffic with the help of artificial intelligence algorithms, with the aim of detecting the possible presence of an attacker by distinguishing it from a legitimate user. This is commonly done by collecting the traffic exchanged between terminals in a network and analyzing it on a per-packet or per-connection basis. In this paper, we propose instead to perform pre-processing of network traffic under analysis with the aim of extracting some new metrics on which we can perform more efficient detection and overcome some limitations of classical approaches. These new metrics are based on graph theory, and consider the network as a whole, rather than focusing on individual packets or connections. Our approach is validated through experiments performed on publicly available data sets, from which it results that it can not only overcome some of the limitations of classical approaches, but also achieve a better detection capability of cyber threats.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to improve machine - learning - based network intrusion detection through graph - theory methods in order to enhance the early detection ability of network security threats**. Specifically, traditional network intrusion detection methods mainly rely on analyzing each packet or connection in network traffic, extracting features such as IP addresses, port numbers, and the number of transmissions, and using these features to distinguish between normal users and potential attackers. However, these methods have some limitations: 1. **Low correlation between features and malicious behavior**: Traditional features (such as IP addresses, port numbers, etc.) have a weak association with the existence of actual malware. 2. **Considering each terminal in isolation**: Traditional methods usually analyze the behavior of each terminal separately without comprehensive consideration from the perspective of the overall network. 3. **Susceptible to attackers' evasion techniques**: Attackers can mask their behavior by manipulating data packets and other means, thereby evading detection. 4. **Limitations of encryption protocols**: Modern security protocols use encrypted payloads, making it difficult to directly analyze the content of data packets. To solve these problems, this paper proposes a new method, namely **graph - theory - based network traffic pre - processing**. This method regards the entire network as a graph, where each node represents a terminal (such as an IP address), and each edge represents the connection between two terminals. By calculating the topological features of the graph (such as degree centrality, closeness centrality, betweenness centrality, etc.), new indicators that can detect network intrusions more efficiently can be extracted. ### Main contributions of the paper: - **Introducing graph - theory features**: Using concepts in graph theory (such as the degree and centrality of nodes) to describe the overall structure of the network, rather than just focusing on individual data packets or connections. - **Improving detection performance**: Experimental results show that the detection method based on graph - theory features not only overcomes some limitations of traditional methods but also can more effectively identify malicious activities in the network. - **Verifying the effectiveness of the method**: Through experiments on public data sets (such as CIC - IDS2017), the superiority of this method has been proven. In conclusion, this paper aims to improve machine - learning - based network intrusion detection systems by introducing graph - theory methods, so as to more effectively deal with modern network security threats.