Mitigating cyber threats through integration of feature selection and stacking ensemble learning: the LGBM and random forest intrusion detection perspective

Amit Kumar Mishra,Shweta Paliwal
DOI: https://doi.org/10.1007/s10586-022-03735-8
2022-09-15
Cluster Computing
Abstract:The network traffic has observed astounding expansion and is set to explode in the next few years. Security attacks are becoming more and more synchronized as attackers are involved in using new orchestrated techniques that are capable of initiating attacks such as zero-day vector and slow loris. These attacks are surpassing the current network analytic solutions employed in the infrastructure of the network. Machine learning (ML) based approaches are successfully quelling modern-day attacks by analyzing the patterns in the encrypted network traffic. Detection strategies based on labelled datasets that are a combination of synthesized attacks and modern normal attacks became the need of the hour. In this study, three benchmark datasets; UNSWNB15, NSL- KDD, and BoT-Internet of things are a combination of modern-day orchestrated security attacks. The datasets are processed and feature selection is performed using information gain and correlation coefficient (Pearson). Once the features are identified they are subjected to the following classifiers; stacking of light gradient boosting machine (LGBM) and random forest, stochastic gradient descent, Gaussian Naive Bayes (GNB), support vector machine (SVM), bagging + reduced error pruning, K nearest neighbour and AdaBoost. Thus it has been observed that stacking of LGBM and random forest has given the highest predictions for all three datasets.
computer science, information systems, theory & methods
What problem does this paper attempt to address?