Influence of fractal dimension on quality classification of computer attacks by machine learning methods
Oleg I. Sheluhin,Sergey Y. Rybakov,Anna V. Vanyushina,,,
DOI: https://doi.org/10.36724/2409-5419-2023-15-1-57-64
2023-01-01
H&ES Research
Abstract:For building an effective network protection system in computer network against attacks, a promising direction is joint use of fractal analysis and data mining. It is proposed to increase the efficiency of network attacks classification by introducing additional fractal dimension (FD) statistics of attacks along with other attributes. In contrast to the well-known works, it is proposed to further improve the efficiency of classifying network attacks by using not only the average value, but also other statistical characteristics of the DF of attacks and normal traffic as information features. These can be variance, skewness and kurtosis coefficients that characterize the shape and parameters of the distribution of the RF. The effectiveness of the proposed method is evaluated using machine learning algorithms by assessing the quality of the binary classification of network attacks and normal traffic using the UNSW-NB15 database as an example. The following classification algorithms were used to classify the dataset: k-nearest neighbors (k-NN), multiple logistic regression (LR), decision tree (DTC), random forest (RF), ada boost. The following metrics were used to evaluate the effectiveness of the constructed models: accuracy (precision), recall (recall), F-score (F-score), ROC-curves, AUC-ROC. It is shown that the use of mean value, variance, skewness and kurtosis coefficients, which characterize the shape and distribution parameters of the statistical characteristics of the FD distribution as additional information features, makes it possible to increase the efficiency of attack classification by an average of 10%. K-NN and LR classification algorithms. For the DTC and RF algorithms, the greatest effect from the use of additional attributes is in reducing the training and testing time and is about 3.5 times for each of the algorithms.
English Else