Fuzzing the PHP Interpreter via Dataflow Fusion

Yuancheng Jiang,Chuqi Zhang,Bonan Ruan,Jiahao Liu,Manuel Rigger,Roland Yap,Zhenkai Liang
2024-10-29
Abstract:PHP, a dominant scripting language in web development, powers a vast range of websites, from personal blogs to major platforms. While existing research primarily focuses on PHP application-level security issues like code injection, memory errors within the PHP interpreter have been largely overlooked. These memory errors, prevalent due to the PHP interpreter's extensive C codebase, pose significant risks to the confidentiality, integrity, and availability of PHP servers. This paper introduces FlowFusion, the first automatic fuzzing framework specifically designed to detect memory errors in the PHP interpreter. FlowFusion leverages dataflow as an efficient representation of test cases maintained by PHP developers, merging two or more test cases to produce fused test cases with more complex code semantics. Moreover, FlowFusion employs strategies such as test mutation, interface fuzzing, and environment crossover to further facilitate memory error detection. In our evaluation, FlowFusion identified 56 unknown memory errors in the PHP interpreter, with 38 fixed and 4 confirmed. We compared FlowFusion against the official test suite and a naive test concatenation approach, demonstrating that FlowFusion can detect new bugs that these methods miss, while also achieving greater code coverage. Furthermore, FlowFusion outperformed state-of-the-art fuzzers AFL++ and Polyglot, covering 24% more lines of code after 24 hours of fuzzing under identical execution environments. FlowFusion has been acknowledged by PHP developers, and we believe our approach offers a practical tool for enhancing the security of the PHP interpreter.
Cryptography and Security
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of memory error detection in the PHP interpreter. Specifically, PHP, as a scripting language widely used in web development, its interpreter is prone to various memory errors due to its large C code base. These memory errors can seriously affect the confidentiality, integrity, and availability of the PHP server. However, existing research mainly focuses on security issues at the PHP application level (such as code injection and memory errors), and less attention has been paid to memory errors inside the PHP interpreter. To solve this problem, the author proposes FlowFusion - an automated fuzz - testing framework specifically designed to detect memory errors in the PHP interpreter. FlowFusion is achieved in the following ways: 1. **Dataflow Fusion**: Using dataflow as an efficient method to represent official test cases, fusing the dataflows of two or more test cases to generate new test cases with more complex code semantics. 2. **Test Mutation**: Performing mutation operations on official test cases to introduce additional randomness. 3. **Interface Fuzzing**: Enhancing the complexity of the fused test cases by calling random PHP functions. 4. **Environment Crossover**: Merging the execution environments of different test cases and inserting random configurations to increase diversity. Through these methods, FlowFusion can generate more test cases with complex semantics based on existing test cases, thereby increasing code coverage and discovering hidden memory errors. In the evaluation, FlowFusion successfully discovered 56 unknown memory errors, of which 38 have been fixed and 4 have been confirmed. In addition, FlowFusion covered more lines of code than the existing top - fuzz - testing tools AFL++ and Polyglot in 24 - hour fuzz - testing, proving its effectiveness and superiority. In summary, the main contribution of this paper is to propose a new method - dataflow fusion - for automatically discovering memory errors in the PHP interpreter, and to verify the effectiveness of this method by implementing the FlowFusion framework.