A Tale of Two DL Cities: When Library Tests Meet Compiler

Qingchao Shen,Yongqiang Tian,Haoyang Ma,Junjie Chen,Lili Huang,Ruifeng Fu,Shing-Chi Cheung,Zan Wang
2024-08-14
Abstract:Deep Learning (DL) compilers typically load a DL model and optimize it with intermediate representation.Existing DL compiler testing techniques mainly focus on model optimization stages, but rarely explore bug detection at the model loading stage. Effectively testing the model loading stage requires covering diverse usages of each DL operator from various DL libraries, which shares a common objective with DL library testing, indicating that the embedded knowledge in DL library tests is beneficial for testing the model loading stage of DL compilers. In this work, we propose OPERA to extract such domain knowledge from the test inputs for DL libraries. OPERA constructs diverse tests from the various test inputs for DL libraries (including the test inputs documented in DL libraries and those generated by recent fuzzers). In addition, it incorporates a diversity-based test prioritization strategy to migrate and execute those test inputs that are more likely to detect diverse bugs earlier. We considered three sources of tests in DL libraries for migration and used eight frontends from three DL compilers (e.g., TVM, TensorRT, and OpenVINO) for evaluation. OPERA detected 170 previously unknown bugs in total, 90 of which have been confirmed/fixed by developers, demonstrating the effectiveness of such the migration-based idea. The test prioritization strategy in OPERA improves testing efficiency with migrated tests by 11.9%~47.4% on average compared to general test prioritization strategies.
Software Engineering
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to address the insufficient testing of deep learning (DL) compilers during the model - loading phase. Specifically, existing DL compiler testing techniques mainly focus on the optimization phase (such as hardware - independent optimization and hardware - specific optimization), while ignoring the testing of the model - loading phase. The model - loading phase involves converting models from different DL libraries (such as PyTorch, Keras, etc.) into a high - level intermediate representation (IR), and this process needs to handle different usages of various operators and their parameter combinations. #### Main problem description 1. **Limitations of existing testing techniques**: - Existing DL compiler testing techniques mainly focus on the optimization phase and ignore the model - loading phase. - Existing techniques (such as NNSmith) can generate complex models for stress testing, but are not suitable for testing the model - loading phase, because the latter needs to cover the diverse usages of various operators rather than complex operator dependencies. 2. **Difficulties in manually developing test tools**: - Manually developing test - generation tools that follow the corresponding grammar is both time - consuming and error - prone, especially when facing a large number of DL libraries and supported operators. - Operators usually involve numerous parameters, resulting in complex constraint conditions and further increasing the development difficulty. 3. **Necessity of transferring DL library test knowledge**: - The test objectives of DL libraries and the model - loading phase of DL compilers are similar, that is, to ensure the correctness of operators under various usages. - The knowledge embedded in DL library tests (such as various usage examples of operators) is very beneficial for testing the model - loading phase of DL compilers. ### Overview of the solution To solve the above problems, the authors propose a transfer - based testing technique - OPERA (Operator Adapter). The main contributions of OPERA include: 1. **Transferring DL library test knowledge**: - Propose the idea of transferring knowledge from DL library tests to enhance the testing of the DL compiler model - loading phase. 2. **Designing the transfer technique**: - Design OPERA, which integrates multiple transfer sources (such as tests in DL library documents and tests generated by recent fuzzers), and combines a diversity - priority strategy for test priority ranking. 3. **Evaluating effectiveness**: - Conducted extensive evaluations on three popular DL compilers (TVM, TensorRT, OpenVINO), detecting 170 previously unknown bugs, of which 90 have been confirmed or fixed. 4. **Improving test efficiency**: - Through the diversity - priority strategy, OPERA improves test efficiency, with an average improvement rate of 11.9% to 47.4%. ### Conclusion This paper significantly improves the test coverage and efficiency of the DL compiler model - loading phase by introducing the method of transferring DL library test knowledge, and solves the deficiencies of existing testing techniques in this regard.