Introducing a Comprehensive, Continuous, and Collaborative Survey of Intrusion Detection Datasets

Philipp Bönninghausen,Rafael Uetz,Martin Henze
DOI: https://doi.org/10.1145/3675741.3675754
2024-08-05
Abstract:Researchers in the highly active field of intrusion detection largely rely on public datasets for their experimental evaluations. However, the large number of existing datasets, the discovery of previously unknown flaws therein, and the frequent publication of new datasets make it hard to select suitable options and sufficiently understand their respective limitations. Hence, there is a great risk of drawing invalid conclusions from experimental results with respect to detection performance of novel methods in the real world. While there exist various surveys on intrusion detection datasets, they have deficiencies in providing researchers with a profound decision basis since they lack comprehensiveness, actionable details, and up-to-dateness. In this paper, we present COMIDDS, an ongoing effort to comprehensively survey intrusion detection datasets with an unprecedented level of detail, implemented as a website backed by a public GitHub repository. COMIDDS allows researchers to quickly identify suitable datasets depending on their requirements and provides structured and critical information on each dataset, including actual data samples and links to relevant publications. COMIDDS is freely accessible, regularly updated, and open to contributions.
Cryptography and Security
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the difficulty in the selection and understanding of existing intrusion detection datasets. Specifically, when evaluating new intrusion detection methods, researchers rely on public datasets. However, there are a large number of these datasets, their quality varies, and new datasets are constantly being released. This makes it difficult for researchers to find datasets that suit their needs and fully understand the limitations of these datasets. Therefore, there is a great risk in drawing valid conclusions about the detection performance of new methods in the real world from experimental results. To address these problems, the author proposes **Comidds** (a comprehensive, continuously updated, and collaborative intrusion detection dataset survey platform). The main goals of Comidds are: 1. **Provide detailed information**: Provide a detailed structured description for each dataset, including the environment, activities, data format, related publications, and sample data fragments. 2. **Continuous update**: Ensure the timeliness and accuracy of information by regularly releasing versions and change logs. 3. **Promote collaboration**: Implemented based on GitHub repositories, allowing the community to contribute and make corrections. 4. **Easy to use**: Provide machine - readable CSV files for users to perform custom sorting, filtering, or plotting easily. Through these measures, Comidds helps researchers find suitable intrusion detection datasets more effectively and improves their understanding of the selected datasets, thereby promoting more reliable research results. ### Formula Representation There are no specific mathematical, physical, chemical, or biological formulas involved in the paper, so there is no need to use the Markdown formula format. ### Summary This paper aims to help researchers better select and understand intrusion detection datasets by creating a comprehensive, continuously updated, and collaborative platform (Comidds), thereby improving the reliability and effectiveness of research.