DeepDiffer: Find Deep Learning Compiler Bugs Via Priority-guided Differential Fuzzing

Kuiliang Lin,Xiangpu Song,Yingpei Zeng,Shanqing Guo
DOI: https://doi.org/10.1109/qrs60937.2023.00066
2023-01-01
Abstract:Recently, Deep learning (DL) compilers have been widely developed to optimize the deployment of DL models. These DL compilers transform DL models into high-level intermediate representation (IR) and then into low-level IR, ultimately generating optimized codes for different hardware targets. However, DL compilers are not immune to generating incorrect code, leading to potentially severe consequences. Testing techniques for low-level IR are limited, and efficient approaches for detecting some categories of non-crashing bugs are lacking. In this paper, we address the limitations of existing low-level IR DL compiler testing techniques and introduce DeepDiffer, a priority-guided differential testing framework designed to detect bugs resulting from low-level optimizations in the DL compiler, specifically TVM. We propose a novel DL compiler coverage metric and establish an optimization goal to maximize the detection of valuable differences between DL compilers. Our experiments demonstrate that DeepDiffer outperforms existing low-level IR fuzzers, detecting a wider range of bug types. In fact, DeepDiffer has successfully identified 13 bugs in TVM, which can be categorized into 9 distinct root causes, and 9 bugs are first found. We have submitted these bugs to the TVM community, where they have been confirmed.
What problem does this paper attempt to address?