Towards Efficient Large-Scale Interprocedural Program Static Analysis on Distributed Data-Parallel Computation

Rong Gu,Zhiqiang Zuo,Xi Jiang,Han Yin,Zhaokang Wang,Linzhang Wang,Xuandong Li,Yihua Huang
DOI: https://doi.org/10.1109/tpds.2020.3036190
IF: 5.3
2020-01-01
IEEE Transactions on Parallel and Distributed Systems
Abstract:Static program analysis has been widely applied along the whole process of the program development for bug detection, code optimization, testing, etc. Although researchers have made significant work in static program analysis, it is still challenging to perform sophisticated interprocedural analysis on large-scale modern software. The underlying reason is that interprocedural analysis for large-scale modern software is highly computation- and memory-intensive, leading to poor efficiency and scalability. In this article, we introduce an efficient distributed and scalable solution for sophisticated static analysis. Specifically, we propose a data-parallel algorithm and a join-process-filter computation model for the CFL-reachability-based interprocedural analysis. Based on that, an efficient distributed static analysis engine called BigSpa is developed, which is composed of an offline batch static program analysis system and an online incremental static program analysis system. The BigSpa system has high generality and can support all kinds of static analysis tasks that can be expressed as CFL reachability problems. The performance of BigSpa is evaluated on real-world large-scale software datasets. Our experiments show that the offline batch system can exceed an order of magnitude compared with the most advanced analysis tools available on performance, and for incremental analysis with small batch updates on the same data sets, the online analysis system can achieve near real-time response, which is very fast and flexible.
What problem does this paper attempt to address?