Detecting Compromised IoT Devices Using Autoencoders with Sequential Hypothesis Testing

Md Mainuddin,Zhenhai Duan,Yingfei Dong
DOI: https://doi.org/10.1109/BigData59044.2023.10386399
2024-04-21
Abstract:IoT devices fundamentally lack built-in security mechanisms to protect themselves from security attacks. Existing works on improving IoT security mostly focus on detecting anomalous behaviors of IoT devices. However, these existing anomaly detection schemes may trigger an overwhelmingly large number of false alerts, rendering them unusable in detecting compromised IoT devices. In this paper we develop an effective and efficient framework, named CUMAD, to detect compromised IoT devices. Instead of directly relying on individual anomalous events, CUMAD aims to accumulate sufficient evidence in detecting compromised IoT devices, by integrating an autoencoder-based anomaly detection subsystem with a sequential probability ratio test (SPRT)-based sequential hypothesis testing subsystem. CUMAD can effectively reduce the number of false alerts in detecting compromised IoT devices, and moreover, it can detect compromised IoT devices quickly. Our evaluation studies based on the public-domain N-BaIoT dataset show that CUMAD can on average reduce the false positive rate from about 3.57% using only the autoencoder-based anomaly detection scheme to about 0.5%; in addition, CUMAD can detect compromised IoT devices quickly, with less than 5 observations on average.
Cryptography and Security,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to detect compromised Internet of Things (IoT) devices effectively and efficiently, while reducing the false positive rate and quickly identifying abnormal behaviors**. ### Background and Problem Description Internet of Things devices are increasingly widely used in daily life, such as in smart homes, healthcare, transportation, and power networks. However, these devices usually lack built - in security mechanisms and are vulnerable to various security attacks. Most of the existing methods for improving IoT security focus on detecting abnormal behaviors of IoT devices, but these methods may trigger a large number of false positives, making it difficult to effectively detect compromised IoT devices in actual deployments. ### Core Contributions of the Paper To solve the above - mentioned problems, the author proposes an effective framework named CUMAD (Cumulative Anomaly Detection), which aims to detect compromised IoT devices by accumulating sufficient evidence. CUMAD combines the following two subsystems: 1. **Anomaly Detection Subsystem Based on Autoencoder**: - An autoencoder is an unsupervised neural network that aims to reconstruct input data. It captures normal behavior patterns by learning the latent space representation of the input data. - During the training phase, the autoencoder is trained with normal network traffic data to learn the normal behavior model. - During the detection phase, if the reconstruction error of the input data exceeds a predefined threshold, the data point is considered abnormal. 2. **Sequential Hypothesis Testing Subsystem Based on Sequential Probability Ratio Test (SPRT)**: - SPRT is a statistical tool that can make decisions quickly after observing enough data. - SPRT judges whether an IoT device is compromised by accumulating multiple anomaly detection results, rather than relying on a single abnormal event. - This cumulative method can significantly reduce the false positive rate and can quickly detect compromised devices. ### Performance Evaluation To verify the effectiveness of CUMAD, the author evaluated it using the publicly available N - BaIoT dataset. The experimental results show that: - CUMAD reduces the false positive rate from approximately 3.57% to approximately 0.5%, with a performance improvement of about 7 times. - CUMAD can quickly detect compromised IoT devices with an average of less than 5 consecutive observations. ### Formula Summary - **Reconstruction Error of Autoencoder**: \[ L(x, x')=\text{MSE}(x, x') = \frac{1}{n}\sum_{i = 1}^{n}(x_i - x'_i)^2 \] where \(x\) is the input data and \(x'\) is the reconstructed output data. - **Probability Ratio Update Formula of SPRT**: \[ \Lambda_n=\sum_{i = 1}^{n}z_i=\sum_{i = 1}^{n}\ln\left(\frac{\Pr(y_i|H_1)}{\Pr(y_i|H_0)}\right) \] where \(z_i = \ln\left(\frac{\Pr(y_i|H_1)}{\Pr(y_i|H_0)}\right)\) and \(\Lambda_n\) is the cumulative probability ratio. - **Decision Boundary of SPRT**: \[ A\approx\ln\left(\frac{\beta}{1 - \alpha}\right),\quad B\approx\ln\left(\frac{1 - \beta}{\alpha}\right) \] where \(\alpha\) is the user - desired false positive rate and \(\beta\) is the user - desired false negative rate. In this way, CUMAD not only improves the accuracy of detecting compromised IoT devices but also significantly reduces the false positive rate, making it more reliable and practical in practical applications.