GothX: a generator of customizable, legitimate and malicious IoT network traffic

Manuel Poisson,Kensuke Fukuda,Rodrigo Carnier
2024-07-24
Abstract:In recent years, machine learning-based anomaly detection (AD) has become an important measure against security threats from Internet of Things (IoT) networks. Machine learning (ML) models for network traffic AD require datasets to be trained, evaluated and compared. Due to the necessity of realistic and up-to-date representation of IoT security threats, new datasets need to be constantly generated to train relevant AD models. Since most traffic generation setups are developed considering only the author's use, replication of traffic generation becomes an additional challenge to the creation and maintenance of useful datasets. In this work, we propose GothX, a flexible traffic generator to create both legitimate and malicious traffic for IoT datasets. As a fork of Gotham Testbed, GothX is developed with five requirements: 1)easy configuration of network topology, 2) customization of traffic parameters, 3) automatic execution of legitimate and attack scenarios, 4) IoT network heterogeneity (the current iteration supports MQTT, Kafka and SINETStream services), and 5) automatic labeling of generated datasets. GothX is validated by two use cases: a) re-generation and enrichment of traffic from the IoT dataset MQTTset,and b) automatic execution of a new realistic scenario including the exploitation of a CVE specific to the Kafka-MQTT network topology and leading to a DDoS attack. We also contribute with two datasets containing mixed traffic, one made from the enriched MQTTset traffic and another from the attack scenario. We evaluated the scalability of GothX (450 IoT sensors in a single machine), the replication of the use cases and the validity of the generated datasets, confirming the ability of GothX to improve the current state-of-the-art of network traffic generation.
Cryptography and Security
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the challenges in the generation and maintenance of current Internet of Things (IoT) network traffic datasets, especially for the mixed datasets of legitimate and malicious traffic. Specifically, the author points out the following key issues: 1. **Limitations of existing datasets**: - Existing public datasets usually contain only one of legitimate or malicious traffic and lack diversity. - The labels of datasets are inaccurate or missing, making it difficult to distinguish normal and abnormal traffic when training machine - learning models. - Datasets are updated slowly and cannot reflect the latest IoT security threats. 2. **Complexity and cost of traffic generation**: - Building a test platform from scratch to generate customized data requires a great deal of time and expertise. - Most of the existing traffic - generation tools are designed for specific research and do not have enough flexibility to adapt to new experimental requirements. 3. **Insufficient functionality of traffic - generation tools**: - Existing tools such as Gotham can generate legitimate and malicious traffic, but their network simulation scenarios are not flexible enough and cannot automatically generate labeled datasets. - Some common IoT communication protocols and services (such as Kafka, SINETStream) are not fully supported. To solve these problems, the author proposes a new traffic generator named GothX. GothX aims to generate customizable, labeled legitimate and malicious IoT network traffic datasets and has the following features: - **Ease of use and customizability**: Users can easily adjust network topologies, traffic parameters, and attack scenarios through configuration files. - **Automated execution**: It can automatically execute legitimate and attack scenarios and automatically generate labeled datasets. - **Heterogeneous network support**: It supports multiple IoT communication protocols and services (such as MQTT, Kafka, SINETStream) to simulate a real - life IoT environment. - **High scalability and reproducibility**: It can be scaled up to 450 IoT sensors on a single machine and ensure the reproducibility of results. Through these improvements, GothX not only improves the quality and diversity of datasets but also simplifies the process for researchers to generate customized datasets, thus better supporting the training and evaluation of machine - learning - based anomaly - detection models.