CTOS: Compiler Testing for Optimization Sequences of LLVM

He Jiang,Zhide Zhou,Zhilei Ren,Jingxuan Zhang,Xiaochen Li
DOI: https://doi.org/10.1109/tse.2021.3058671
IF: 7.4
2021-01-01
IEEE Transactions on Software Engineering
Abstract:Optimization sequences are often employed in compilers to improve the performance of programs, but may trigger critical compiler bugs, e.g., compiler crashes. Although many methods have been developed to automatically test compilers, no systematic work has been conducted to detect compiler bugs when applying arbitrary optimization sequences. To resolve this problem, two main challenges need to be addressed, namely the acquisition of representative optimization sequences and the selection of representative testing programs, due to the enormous number of optimization sequences and testing programs. In this study, we propose CTOS, a novel compiler testing method based on differential testing, for detecting compiler bugs caused by optimization sequences of LLVM. CTOS first leverages the technique Doc2Vec to transform optimization sequences into vectors to capture the information of optimizations and their orders simultaneously. Second, a method based on the region graph and call relationships is developed in CTOS to construct the vector representations of the testing program, such that the semantics and the structure information of programs can be captured simultaneously. Then, with the vector representations of optimization sequences and testing programs, a “centroid” based selection scheme is proposed to address the above two challenges. Finally, CTOS takes in the representative optimization sequences and testing programs as inputs, and tests each testing program with all the representative optimization sequences. If there is an output that is different from the majority of others of a given testing program, then the corresponding optimization sequence is deemed to trigger a compiler bug. Our evaluation demonstrates that CTOS significantly outperforms the baselines by up to $24.76\% \sim 50.57\%$ in terms of the bug-finding capability on average. Within seven month evaluations on LLVM, we have reported 104 valid bugs within 5 types, of which 21 have been confirmed or fixed. Most of those bugs are crash bugs (57) and wrong code bugs (24). 47 unique optimizations are identified to be faulty and 15 of them are loop related optimizations.
What problem does this paper attempt to address?