Abstract:In recent years, machine learning-based anomaly detection (AD) has become an important measure against security threats from Internet of Things (IoT) networks. Machine learning (ML) models for network traffic AD require datasets to be trained, evaluated and compared. Due to the necessity of realistic and up-to-date representation of IoT security threats, new datasets need to be constantly generated to train relevant AD models. Since most traffic generation setups are developed considering only the author's use, replication of traffic generation becomes an additional challenge to the creation and maintenance of useful datasets. In this work, we propose GothX, a flexible traffic generator to create both legitimate and malicious traffic for IoT datasets. As a fork of Gotham Testbed, GothX is developed with five requirements: 1)easy configuration of network topology, 2) customization of traffic parameters, 3) automatic execution of legitimate and attack scenarios, 4) IoT network heterogeneity (the current iteration supports MQTT, Kafka and SINETStream services), and 5) automatic labeling of generated datasets. GothX is validated by two use cases: a) re-generation and enrichment of traffic from the IoT dataset MQTTset,and b) automatic execution of a new realistic scenario including the exploitation of a CVE specific to the Kafka-MQTT network topology and leading to a DDoS attack. We also contribute with two datasets containing mixed traffic, one made from the enriched MQTTset traffic and another from the attack scenario. We evaluated the scalability of GothX (450 IoT sensors in a single machine), the replication of the use cases and the validity of the generated datasets, confirming the ability of GothX to improve the current state-of-the-art of network traffic generation.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the challenges in the generation and maintenance of current Internet of Things (IoT) network traffic datasets, especially for the mixed datasets of legitimate and malicious traffic. Specifically, the author points out the following key issues: 1. **Limitations of existing datasets**: - Existing public datasets usually contain only one of legitimate or malicious traffic and lack diversity. - The labels of datasets are inaccurate or missing, making it difficult to distinguish normal and abnormal traffic when training machine - learning models. - Datasets are updated slowly and cannot reflect the latest IoT security threats. 2. **Complexity and cost of traffic generation**: - Building a test platform from scratch to generate customized data requires a great deal of time and expertise. - Most of the existing traffic - generation tools are designed for specific research and do not have enough flexibility to adapt to new experimental requirements. 3. **Insufficient functionality of traffic - generation tools**: - Existing tools such as Gotham can generate legitimate and malicious traffic, but their network simulation scenarios are not flexible enough and cannot automatically generate labeled datasets. - Some common IoT communication protocols and services (such as Kafka, SINETStream) are not fully supported. To solve these problems, the author proposes a new traffic generator named GothX. GothX aims to generate customizable, labeled legitimate and malicious IoT network traffic datasets and has the following features: - **Ease of use and customizability**: Users can easily adjust network topologies, traffic parameters, and attack scenarios through configuration files. - **Automated execution**: It can automatically execute legitimate and attack scenarios and automatically generate labeled datasets. - **Heterogeneous network support**: It supports multiple IoT communication protocols and services (such as MQTT, Kafka, SINETStream) to simulate a real - life IoT environment. - **High scalability and reproducibility**: It can be scaled up to 450 IoT sensors on a single machine and ensure the reproducibility of results. Through these improvements, GothX not only improves the quality and diversity of datasets but also simplifies the process for researchers to generate customized datasets, thus better supporting the training and evaluation of machine - learning - based anomaly - detection models.

GothX: a generator of customizable, legitimate and malicious IoT network traffic

Gotham Testbed: a Reproducible IoT Testbed for Security Experiments and Dataset Generation

IoTGemini: Modeling IoT Network Behaviors for Synthetic Traffic Generation

Knowledge Enhanced GAN for IoT Traffic Generation

SynGAN: Towards Generating Synthetic Network Attacks using GANs

IoTFlowGenerator: Crafting Synthetic IoT Device Traffic Flows for Cyber Deception

Anomaly-Based Intrusion on IoT Networks Using AIGAN-a Generative Adversarial Network

Packet-Level Adversarial Network Traffic Crafting using Sequence Generative Adversarial Networks

Hybrid Data Augmentation Based Machine Learning Approach for Botnet Attack Detection in IOT Networks

ProGen: Projection-Based Adversarial Attack Generation Against Network Intrusion Detection

An LSTM Based Malicious Traffic Attack Detection in Industrial Internet

Systematic review and characterisation of malicious industrial network traffic datasets

Generation & evaluation of datasets for anomaly-based intrusion detection systems in IoT environments

Toward Efficiently Evaluating the Robustness of Deep Neural Networks in IoT Systems: A GAN-Based Method

GPMT: Generating Practical Malicious Traffic Based on Adversarial Attacks with Little Prior Knowledge

IoTGAN: GAN Powered Camouflage Against Machine Learning Based IoT Device Identification

GNN-Based Network Traffic Analysis for the Detection of Sequential Attacks in IoT

Traffic data extraction and labeling for machine learning based attack detection in IoT networks

METHODS AND MEANS TO IMPROVE THE EFFICIENCY OF NETWORK TRAFFIC SECURITY MONITORING BASED ON ARTIFICIAL INTELLIGENCE

Sequential IoT Data Augmentation using Generative Adversarial Networks

GAN-Based Privacy Abuse Attack on Federated Learning in IoT Networks