Mind the Gap: Towards Generalizable Autonomous Penetration Testing via Domain Randomization and Meta-Reinforcement Learning

Shicheng Zhou,Jingju Liu,Yuliang Lu,Jiahai Yang,Yue Zhang,Jie Chen
2025-02-11
Abstract:With increasing numbers of vulnerabilities exposed on the internet, autonomous penetration testing (pentesting) has emerged as a promising research area. Reinforcement learning (RL) is a natural fit for studying this topic. However, two key challenges limit the applicability of RL-based autonomous pentesting in real-world scenarios: (a) training environment dilemma -- training agents in simulated environments is sample-efficient while ensuring their realism remains challenging; (b) poor generalization ability -- agents' policies often perform poorly when transferred to unseen scenarios, with even slight changes potentially causing significant generalization gap. To this end, we propose GAP, a generalizable autonomous pentesting framework that aims to realizes efficient policy training in realistic environments and train generalizable agents capable of drawing inferences about other cases from one instance. GAP introduces a Real-to-Sim-to-Real pipeline that (a) enables end-to-end policy learning in unknown real environments while constructing realistic simulations; (b) improves agents' generalization ability by leveraging domain randomization and meta-RL <a class="link-external link-http" href="http://learning.Specially" rel="external noopener nofollow">this http URL</a>, we are among the first to apply domain randomization in autonomous pentesting and propose a large language model-powered domain randomization method for synthetic environment generation. We further apply meta-RL to improve agents' generalization ability in unseen environments by leveraging synthetic environments. The combination of two methods effectively bridges the generalization gap and improves agents' policy adaptation <a class="link-external link-http" href="http://performance.Experiments" rel="external noopener nofollow">this http URL</a> are conducted on various vulnerable virtual machines, with results showing that GAP can enable policy learning in various realistic environments, achieve zero-shot policy transfer in similar environments, and realize rapid policy adaptation in dissimilar environments.
Machine Learning,Cryptography and Security
What problem does this paper attempt to address?