Python Coverage Guided Fuzzing for Deep Learning Framework

Yuanping Nie,Xiong Xiao,Bing Yang,Hanqing Li,Long Luo,Hongfang Yu,Gang Sun
DOI: https://doi.org/10.1109/eeiss62553.2024.00007
2024-01-01
Abstract:In recent years, deep learning (DL) applications have been widely used in both industrial and academic domains. Bugs in the DL framework have become one of the leading causes of DI. model training and deployment failures. At present, fuzzing has emerged as one of the most effective and commonly used techniques to detect bugs in the DL framework. However, existing fuzzing techniques focus on testing the high-level APIs and use a random-based generation approach to explore the vast input space of API parameters. This kind of approach suffers low effectiveness because it cannot distinguish which lines in the lower-level component have been executed. We analyzed the structure of the most popular DL frameworks (e.g., TensorFlow and PyTorch) and found that line coverage information from the Python layer can be used to characterize the execution of the underlying layers. Therefore, we propose PCFuzz, a Python coverage -guided fuzzing technique, to address the above question. The main idea of PCFuzz is to perform guided fuzzing tests based on the Python line coverage of lower-level components. To collect Python coverage at a low cost, we designed a light weighted Python instrumentation approach specifically for DL frameworks. On the basis of this instrumentation, PCFuzz then leverages coverage to optimize the mutation scheduling, thereby enhancing the overall effectiveness of DL framework fuzzing. We conduct experiments on the most popular DL frameworks TensorFlow and PyTorch and compare PCFuzz with the state-of-the-art model -level fuzzer LEMON and API -level fuzzer FreeFuzz. The results show that PCFuzz is more efficient in covering the codes in the low-level Python libraries and raw operations. Specifically, within the same time cost, PCFuzz achieves on average 13.9x higher Python line coverage than LEMON. For FreeFuzz, PCFuzz achieves on average 52.56% higher Python line coverage. In addition, during our preliminary experiments, PCFuzz discovered 1 previously unknown hug in TensorFlow.
What problem does this paper attempt to address?