Abstract:In the past decade, Deep Learning (DL) systems have been widely deployed in various application domains to facilitate our daily life, e.g., natural language processing, healthcare, activity recognition, and autonomous driving. Meanwhile, it is extremely challenging to ensure the correctness of DL systems (e.g., due to their intrinsic nondeterminism), and bugs in DL systems can cause serious consequences and may even threaten human lives. In the literature, researchers have explored various techniques to test, analyze, and verify DL models, since their quality directly affects the corresponding system behaviors. Recently, researchers have also proposed novel techniques for testing the underlying operator-level DL libraries (such as TensorFlow and PyTorch), which provide general binary implementations for each high-level DL operator and are the foundation for running DL models on different hardware platforms. However, there is still limited work targeting the reliability of the emerging tensor compilers (also known as DL compilers), which aim to automatically compile high-level tensor computation graphs directly into high-performance binaries for better efficiency, portability, and scalability than traditional operator-level libraries. Therefore, in this paper, we target the important problem of tensor compiler testing, and have proposed Tzer, a practical fuzzing technique for the widely used TVM tensor compiler. Tzer focuses on mutating the low-level Intermediate Representation (IR) for TVM due to the limited mutation space for the high-level IR. More specifically, Tzer leverages both general-purpose and tensor-compiler-specific mutators guided by coverage feedback for diverse and evolutionary IR mutation; furthermore, since tensor compilers provide various passes (i.e., transformations) for IR optimization, Tzer also performs pass mutation in tandem with IR mutation for more effective fuzzing. Our experimental results show that Tzer substantially outperforms existing fuzzing techniques on tensor compiler testing, with 75% higher coverage and 50% more valuable tests than the 2nd-best technique. Also, different components of Tzer have been validated via ablation study. To date, Tzer has detected 49 previously unknown bugs for TVM, with 37 bugs confirmed and 25 bugs fixed (PR merged).

Detecting Numerical Deviations in Deep Learning Models Introduced by the TVM Compiler

On the usage and development of deep learning compilers: an empirical study on TVM

NNSmith: Generating Diverse and Valid Test Cases for Deep Learning Compilers

Metamorphic Testing of Deep Learning Compilers

REDLC: Learning-driven Reverse Engineering for Deep Learning Compilers

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

The Deep Learning Compiler: A Comprehensive Survey

GenCoG: A DSL-Based Approach to Generating Computation Graphs for TVM Testing.

Effective Random Test Generation for Deep Learning Compilers

Coverage-guided tensor compiler fuzzing with joint IR-pass mutation

Efficient Execution of Quantized Deep Learning Models: A Compiler Approach

swTVM: Towards Optimized Tensor Code Generation for Deep Learning on Sunway Many-Core Processor

DevMuT: Testing Deep Learning Framework Via Developer Expertise-Based Mutation

Compiler-Level Matrix Multiplication Optimization for Deep Learning

PolyJuice: Detecting Mis-compilation Bugs in Tensor Compilers with Equality Saturation Based Rewriting

Auto-tuning Fixed-point Precision with TVM on RISC-V Packed SIMD Extension

Fuzzing Deep Learning Compilers with HirGen

MATCH: Model-Aware TVM-based Compilation for Heterogeneous Edge Devices

DeepCodeProbe: Towards Understanding What Models Trained on Code Learn

SecureTVM: A TVM-Based Compiler Framework for Selective Privacy-Preserving Neural Inference