A Review of Generative Models in Generating Synthetic Attack Data for Cybersecurity

Garima Agrawal,Amardeep Kaur,Sowmya Myneni
DOI: https://doi.org/10.3390/electronics13020322
IF: 2.9
2024-01-12
Electronics
Abstract:The ability of deep learning to process vast data and uncover concealed malicious patterns has spurred the adoption of deep learning methods within the cybersecurity domain. Nonetheless, a notable hurdle confronting cybersecurity researchers today is the acquisition of a sufficiently large dataset to effectively train deep learning models. Privacy and security concerns associated with using real-world organization data have made cybersecurity researchers seek alternative strategies, notably focusing on generating synthetic data. Generative adversarial networks (GANs) have emerged as a prominent solution, lauded for their capacity to generate synthetic data spanning diverse domains. Despite their widespread use, the efficacy of GANs in generating realistic cyberattack data remains a subject requiring thorough investigation. Moreover, the proficiency of deep learning models trained on such synthetic data to accurately discern real-world attacks and anomalies poses an additional challenge that demands exploration. This paper delves into the essential aspects of generative learning, scrutinizing their data generation capabilities, and conducts a comprehensive review to address the above questions. Through this exploration, we aim to shed light on the potential of synthetic data in fortifying deep learning models for robust cybersecurity applications.
engineering, electrical & electronic,computer science, information systems,physics, applied
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the lack of sufficiently realistic and diverse attack datasets in cybersecurity research. Specifically: 1. **Data acquisition challenges**: Due to privacy and security issues, organizations in the real world are unable to share their data, making it difficult for cybersecurity researchers to obtain enough data to effectively train deep - learning models. 2. **Limitations of existing datasets**: Existing cybersecurity datasets are usually generated through simulated attacks such as red - blue teams or hackathons. Although these datasets provide some attack data, the attack scenarios are often limited and specific to the simulated environment, lacking diversity and realism. 3. **The need to generate synthetic data**: In order to effectively defend against the ever - changing threat environment, automated methods are required to generate diverse and realistic attack data without affecting the normal operation of the organization's production environment. To this end, the paper explores the application and potential of generative models, especially generative adversarial networks (GANs), in generating synthetic attack data. The main objective is to evaluate whether the synthetic attack data generated by GANs can effectively enhance the application of deep - learning models in cybersecurity, especially in the ability to detect new or unseen real - world attacks.