MoCo: Fuzzing Deep Learning Libraries Via Assembling Code
Pin Ji,Yang Feng,Duo Wu,Lingyue Yan,Pengling Chen,Jia Liu,Zhihong Zhao
DOI: https://doi.org/10.1109/tse.2024.3509975
IF: 7.4
2024-01-01
IEEE Transactions on Software Engineering
Abstract:The rapidly developing deep learning (DL) techniques have been applied insoftware systems with various application scenarios. However, they could alsopose new safety threats with potentially serious consequences, especially insafety-critical domains. DL libraries serve as the underlying foundation for DLsystems, and bugs in them can have unpredictable impacts that directly affectthe behaviors of DL systems. Previous research on fuzzing DL libraries stillhas limitations in the diversity of test inputs, the construction of testoracles, and the precision of detection. In this paper, we propose MoCo, anovel fuzzing testing method for DL libraries via assembling code. MoCo firstdisassembles the seed code file to obtain the template and code blocks, andthen employs code block mutation operators (e.g., API replacement, randomgeneration, and boundary checking) to generate more new code blocks adapted tothe template. By inserting context-appropriate code blocks into the templatestep by step, MoCo can generate a tree of code files with intergenerationalrelations. According to the derivation relations in this tree and the appliedmutation operators, we construct the test oracle based on the execution stateconsistency. Since the granularity of code assembly and mutation is controlledrather than randomly divergent, we can quickly pinpoint the lines of code wherethe bugs are located and the corresponding triggering conditions. We conduct acomprehensive experiment to evaluate the efficiency and effectiveness of MoCousing three widely-used DL libraries (i.e., TensorFlow, PyTorch, and Jittor).During the experiment, MoCo detects 64 new bugs of four types in three DLlibraries, where 51 bugs have been confirmed, and 13 bugs have been fixed bydevelopers.