GeMID: Generalizable Models for IoT Device Identification

Kahraman Kostas,Rabia Yasa Kostas,Mike Just,Michael A. Lones
2024-11-06
Abstract:With the proliferation of Internet of Things (IoT) devices, ensuring their security has become paramount. Device identification (DI), which distinguishes IoT devices based on their traffic patterns, plays a crucial role in both differentiating devices and identifying vulnerable ones, closing a serious security gap. However, existing approaches to DI that build machine learning models often overlook the challenge of model generalizability across diverse network environments. In this study, we propose a novel framework to address this limitation and evaluate the generalizability of DI models across datasets collected within different network environments. Our approach involves a two-step process: first, we develop a feature and model selection method that is more robust to generalization issues by using a genetic algorithm with external feedback and datasets from distinct environments to refine the selections. Second, the resulting DI models are then tested on further independent datasets in order to robustly assess their generalizability. We demonstrate the effectiveness of our method by empirically comparing it to alternatives, highlighting how fundamental limitations of commonly employed techniques such as sliding window and flow statistics limit their generalizability. Our findings advance research in IoT security and device identification, offering insights into improving model effectiveness and mitigating risks in IoT networks.
Cryptography and Security,Artificial Intelligence,Networking and Internet Architecture
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the lack of model generalization ability in Internet of Things (IoT) device identification (DI). Specifically, most of the existing device identification methods often overlook the generalization ability of the model in different network environments when constructing machine - learning models. These models are usually trained and tested in a specific network environment, resulting in poor performance in other environments and being unable to effectively identify devices of the same type but operating in different network environments. To solve this problem, the paper proposes a new framework aimed at evaluating and enhancing the generalization ability of device identification models in different network environments. The main contributions of this framework include: 1. **Comprehensive evaluation of model generalization ability**: Evaluate the performance of machine - learning - based device identification models in different network environments through multiple datasets, and reveal the key factors affecting generalization ability. 2. **Novel research framework**: Introduce a two - stage framework specifically used to evaluate the generalization ability of device identification models under various conditions, providing a more rigorous testing method. 3. **Feature selection and its impact on generalization ability**: Demonstrate the decisive impact of feature selection and construction methods on the generalization ability of device identification models. In particular, methods based on single - packet features have better generalization ability than those based on flow or window statistics. 4. **Verify packet - based methods**: Through empirical comparison, verify that methods based on single - packet features are superior to those based on flow or window statistics in terms of generalization ability. 5. **Transparency and reproducibility**: Publicly share code and analysis results to promote transparency and encourage further research. ### Core of the problem - **Existing problems**: Most of the current device identification methods rely on datasets from a single network environment for training and testing, ignoring the generalization ability of the model in different network environments. - **Solutions**: Propose a two - stage framework that combines genetic algorithms and external feedback to select more generalized features and evaluate the generalization performance of the model through an independent dataset. ### Main steps 1. **Feature and model selection**: Use genetic algorithms and datasets from different environments to select features and models with greater generalization ability. 2. **Model evaluation**: Further test the generalization ability of the selected model through an independent dataset to ensure its effectiveness in different network environments. Through this method, the paper not only improves the generalization ability of device identification models, but also provides valuable insights and tools for future Internet of Things security research.