MLSTL-WSN: Machine Learning-based Intrusion Detection using SMOTETomek in WSNs

Md. Alamin Talukder,Selina Sharmin,Md Ashraf Uddin,Md Manowarul Islam,Sunil Aryal
2024-02-18
Abstract:Wireless Sensor Networks (WSNs) play a pivotal role as infrastructures, encompassing both stationary and mobile sensors. These sensors self-organize and establish multi-hop connections for communication, collectively sensing, gathering, processing, and transmitting data about their surroundings. Despite their significance, WSNs face rapid and detrimental attacks that can disrupt functionality. Existing intrusion detection methods for WSNs encounter challenges such as low detection rates, computational overhead, and false alarms. These issues stem from sensor node resource constraints, data redundancy, and high correlation within the network. To address these challenges, we propose an innovative intrusion detection approach that integrates Machine Learning (ML) techniques with the Synthetic Minority Oversampling Technique Tomek Link (SMOTE-TomekLink) algorithm. This blend synthesizes minority instances and eliminates Tomek links, resulting in a balanced dataset that significantly enhances detection accuracy in WSNs. Additionally, we incorporate feature scaling through standardization to render input features consistent and scalable, facilitating more precise training and detection. To counteract imbalanced WSN datasets, we employ the SMOTE-Tomek resampling technique, mitigating overfitting and underfitting issues. Our comprehensive evaluation, using the WSN Dataset (WSN-DS) containing 374,661 records, identifies the optimal model for intrusion detection in WSNs. The standout outcome of our research is the remarkable performance of our model. In binary, it achieves an accuracy rate of 99.78% and in multiclass, it attains an exceptional accuracy rate of 99.92%. These findings underscore the efficiency and superiority of our proposal in the context of WSN intrusion detection, showcasing its effectiveness in detecting and mitigating intrusions in WSNs.
Cryptography and Security,Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve the intrusion detection problem in wireless sensor networks (WSNs), especially when facing challenges such as unbalanced datasets, limited computing resources, and high false - alarm rates. Specifically: 1. **Unbalanced Dataset Problem**: In WSNs, the data of normal behaviors are usually much more than that of intrusion behaviors, which leads to the unbalance of the dataset. This imbalance will make the machine - learning model tend to predict the majority class (i.e., normal behavior), thus reducing the detection accuracy of the minority class (i.e., intrusion behavior). 2. **Limited Computing Resources**: Sensor nodes in WSNs usually have limited computing power and storage resources, so an intrusion detection method that can operate efficiently in a resource - constrained environment is required. 3. **High False - Alarm Rate**: Existing intrusion detection methods often have a relatively high false - alarm rate, that is, they wrongly identify normal behaviors as intrusion behaviors. This poses a threat to the security and reliability in practical applications. To solve the above problems, this paper proposes an intrusion - detection method based on machine learning, combined with the Synthetic Minority Over - sampling Technique (SMOTE) and the Tomek Links deletion algorithm (Tomek Links). This method improves the performance of intrusion detection in the following ways: - **Dataset Balancing**: Through the SMOTE - Tomek technique, the number of minority - class samples is increased and the noise points on the boundary are removed, thus making the dataset more balanced and improving the model's ability to detect intrusion behaviors. - **Feature Standardization**: Standardize the input features to ensure that all features are on the same scale, thereby improving the accuracy of training and detection. - **Multiple Machine - Learning Algorithms**: Use multiple machine - learning algorithms, including Decision Tree (DT), Random Forest (RF), Multi - Layer Perceptron (MLP), K - Nearest Neighbors (KNN), XGBoost (XGB), and LightGBM (LGB), to capture different patterns in the data and improve the comprehensiveness and accuracy of detection. Through these improvements, this paper aims to provide a more accurate and reliable WSN intrusion - detection solution that can effectively address the deficiencies in existing methods. The experimental results show that this method achieves an accuracy of 99.78% and 99.92% in binary - classification and multi - classification scenarios respectively, significantly outperforming existing methods. ### Formula Display In the data pre - processing part, the paper mentions the standardization formula for converting features of different scales to the same scale: \[ X_{\text{standardized}}=\frac{X - \mu}{\sigma} \] where: - \( X_{\text{standardized}} \) is the standardized feature value. - \( X \) is the original feature value. - \( \mu \) is the mean of feature \( X \). - \( \sigma \) is the standard deviation of feature \( X \). Through standardization, it is ensured that each feature contributes equally in the learning process and prevents some features from dominating others due to scale differences.