Coverage-guided fuzzing for deep reinforcement learning systems

Xiaohui Wan,Tiancheng Li,Weibin Lin,Yi Cai,Zheng Zheng
DOI: https://doi.org/10.1016/j.jss.2024.111963
IF: 3.5
2024-04-01
Journal of Systems and Software
Abstract:While the past decade has witnessed a growing demand for employing deep reinforcement learning (DRL) in various domains to solve real-world problems, the reliability of DRL systems has become more of a concern. In particular, DRL agents are often trained on data from a potentially biased distribution over environmental settings, causing the trained agents to fail in certain cases despite high average-case performance. Hence, it is necessary and urgent to adequately test DRL agents to ensure the reliability of practical DRL systems. However, due to the fundamental difference in the programming paradigm and the development process, traditional software testing methodology cannot be applied directly to DRL systems. Given that, we introduce a novel testing framework for DRL systems, aiming to generate diverse test cases that can drive a DRL system to fail. Specifically, we design, implement and evaluate DRLFuzz, which is a coverage-guided fuzzing (CGF) framework for systematically testing DRL systems. Experimental results demonstrate that DRLFuzz can efficiently discover diverse failures in different DRL systems for various benchmark tasks. Compared with a random search baseline, DRLFuzz can generate 60% more failed cases in general. Additionally, the diversity of failed cases generated by DRLFuzz is increased by 4 . 6 % ∼ 14 . 1 % in terms of mean pairwise distance (MPD). Furthermore, our experiments also indicate that the failed cases generated by DRLFuzz can be utilized to fine-tune the DRL agent to eliminate the failures resulting from inadequate exploration during training and thus improve the reliability of DRL systems.
computer science, theory & methods, software engineering
What problem does this paper attempt to address?