Octopus : Scaling Value-Flow Analysis via Parallel Collection of Realizable Path Conditions

Wensheng Tang,Dejun Dong,Shijie Li,Chengpeng Wang,Peisen Yao,Jinguo Zhou,Charles Zhang
DOI: https://doi.org/10.1145/3632743
IF: 3.685
2024-01-24
ACM Transactions on Software Engineering and Methodology
Abstract:Value-flow analysis is a fundamental technique in program analysis, benefiting various clients, such as memory corruption detection and taint analysis. However, existing efforts suffer from the low potential speedup that leads to a deficiency in scalability. In this work, we present a parallel algorithm Octopus to collect path conditions for realizable paths efficiently. Octopus builds on the realizability decomposition to collect the intraprocedural path conditions of different functions simultaneously on-demand and obtain realizable path conditions by concatenation, which achieves a high potential speedup in parallelization. We implement Octopus as a tool and evaluate it over 15 real-world programs. The experiment shows that Octopus significantly outperforms the state-of-the-art algorithms. Particularly, it detects NPD bugs for the project llvm with 6.3 MLoC within 6.9 minutes under the 40-thread setting. We also state and prove several theorems to demonstrate the soundness, completeness, and high potential speedup of Octopus . Our empirical and theoretical results demonstrate the great potential of Octopus in supporting various program analysis clients. The implementation has officially deployed at Ant Group, scaling the nightly code scan for massive FinTech applications.
computer science, software engineering
What problem does this paper attempt to address?
The paper is primarily dedicated to addressing the problem of value-flow analysis in program analysis, with a particular focus on improving the efficiency and scalability of the analysis by parallelizing the collection of realizable path conditions. The paper points out that existing value-flow analysis methods face limited speed improvements when dealing with large-scale programs, leading to insufficient scalability. Specifically, while current methods attempt to improve precision through context-sensitive analysis, they encounter challenges in parallel processing, especially when handling function call dependencies. The degree of parallelism is limited, making it difficult to achieve the desired acceleration. To address the above issues, the paper proposes a parallel algorithm named "Octopus." The core innovation of this algorithm lies in introducing the concept of "realizability decomposition," which decouples path condition collection from realizability reasoning to support efficient parallel processing. Additionally, a new graph representation method called the "value-flow segment graph" is introduced to retain the ability to identify realizable paths and support the construction of realizable path conditions by combining intraprocedural path conditions. The Octopus algorithm mainly includes three stages: 1. **Value-Flow Segment Graph Generation**: Constructing the value-flow segment graph for each function, where each segment summarizes the value-flow paths of the intraprocedural process. 2. **Realizable Segment Path Search**: Searching for realizable segment paths on the value-flow segment graph. 3. **Path Condition Collection**: Recovering path conditions from the realizable segment paths. Through these technical means, Octopus can effectively collect realizable path conditions and demonstrate significantly better performance than existing methods in experimental evaluations, particularly achieving notable acceleration in code scanning for large-scale fintech applications.