A practical intrusion detection system based on denoising autoencoder and LightGBM classifier with improved detection performance
Sheikh Abdul Hameed Ayubkhan,Wun-She Yap,Ezra Morris,Mumtaj Begam Kasim Rawthar
DOI: https://doi.org/10.1007/s12652-022-04449-w
IF: 3.662
2022-11-08
Journal of Ambient Intelligence and Humanized Computing
Abstract:Autoencoder and conventional machine learning classifiers are widely used to design an intrusion detection system (IDS). However, noise and corruption in the high-dimensional network traffic samples will still affect the stability and performance of an autoencoder and other conventional machine learning based IDS models. The distortions in the input datasets cause deviations in the learnt patterns and always resulted in a low detection rate. Besides, the IDS classifiers use every single feature to train the samples, which makes the model consumes longer training time, computational resources and memory usage. The main aim of this proposal is to remove the distortions from the network traffic and train the IDS model in a faster manner to detect any category of intruders in the network traffic by achieving a higher detection rate in a short training time. To achieve this, we propose an intrusion detection system that combines a denoising autoencoder and LightGBM classifier. The denoising autoencoder removes the noise and corruptions in the network traffic, thereby possibly avoiding the deviations which can enhance the features learning capacity required for classification. Subsequently, to classify the samples, the LightGBM classifier is used. The classifier uses the feature histogram bins with larger gradients, thus avoiding using each feature at every iteration to accelerate the training speed and boost the predictive capacity of the model. The proposed model shows better detection performance improvement over nine benchmark datasets including CIDDS-001, CIDDS-002, ISCX-URL2016, UNSW-NB15, CIC-IDS-2017, ISCX-Tor2016, BoT-IoT, IoTID20 and Kyoto 2006+ for both binary classification and multi-classification tasks as compared to other existing IDS. The model achieves the maximum detection rate of over 99.60% for CIDDS-001, 99.90% for CIDDS-002, 97.00% for ISCX-Tor2016, 96.11% for UNSW-NB15, 99.86% for CIC-IDS17, 97.76% for ISCX-URL16, 99.91% for BoT-IoT, 97.43% for both IoTID2020 and Kyoto 2006+ datasets respectively, while the training time ranges from 1.10 to 21.78 s only. More importantly, the proposed model has higher learning and predictivity capacity which boosts the generalization capacity. The model also shows good performance in detecting all classes including the minority classes for all aforementioned datasets without any oversampling techniques. The efficiency of the model emphasizes that it can be deployed as a real-time model in any industrial network traffic that includes IoT based smart environment and fog-cloud computing network.
computer science, information systems,telecommunications, artificial intelligence