Optimization of the Context-Free Language Reachability Matrix-Based Algorithm

Ilia Muravev
2024-01-20
Abstract:Various static analysis problems are reformulated as instances of the Context-Free Language Reachability (CFL-r) problem. One promising way to make solving CFL-r more practical for large-scale interprocedural graphs is to reduce CFL-r to linear algebra operations on sparse matrices, as they are efficiently executed on modern hardware. In this work, we present five optimizations for a matrix-based CFL-r algorithm that utilize the specific properties of both the underlying semiring and the widely-used linear algebra library SuiteSparse:GraphBlas. Our experimental results show that these optimizations result in orders of magnitude speedup, with the optimized matrix-based CFL-r algorithm consistently outperforming state-of-the-art CFL-r solvers across four considered static analyses.
Programming Languages
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the solving efficiency of the Context - Free Language Reachability (CFL - r) problem, especially the performance on large - scale interprocedural graphs. Specifically, the CFL - r problem is a core problem in static analysis, which involves finding paths defined by context - free languages (CFL) in labeled graphs. Many static analysis tasks can be transformed into CFL - r problems, such as alias analysis, pointer analysis, value - flow analysis, and fixing compilation errors. To solve this problem, the author proposes a matrix - based CFL - r algorithm and optimizes this algorithm by taking advantage of the modern hardware's efficient execution ability for sparse matrix operations. Specifically, the author proposes five optimization measures, which improve the bottlenecks in matrix multiplication and element - level union operations. Through these optimizations, the author achieves an order - of - magnitude speed improvement, and the optimized matrix - based CFL - r algorithm is always superior to the existing state - of - the - art CFL - r solvers in four static analysis tasks. ### Overview of Optimization Measures 1. **Matrix Multiplication Optimization**: - Replace the original matrix multiplication \( M \cdot_{R_Gr} M \) with \( (M_{old} \cdot_{R_Gr} \Delta M) \cup (\Delta M \cdot_{R_Gr} M) \), where \( \Delta M = M \setminus M_{old} \) is the element - level set difference. This reduces duplicate calculations. 2. **Sparse Matrix Format Optimization**: - Maintain two copies of the matrix \( M \), stored in row - major and column - major formats respectively. Select the appropriate format for multiplication operations according to the sparsity of the matrix to improve computational efficiency. 3. **Matrix Storage Optimization**: - Instead of storing \( M \) as a single matrix, decompose it into multiple sub - matrices \( eM=\{M_1, M_2,\ldots, M_p\} \) and merge these sub - matrices according to specific rules to reduce memory reconstruction overhead. 4. **CFG Production Rule Optimization**: - For CFGs with a large number of production rules, use "index" non - terminals to reduce the number of boolean matrix multiplications. For example, all rules of the form \( A_R_i \to A_{ret_i} \) are counted as only one rule. 5. **CFG Transformation Optimization**: - For Java's field - sensitive pointer analysis and C/C++'s field - insensitive alias analysis, manually transform CFGs to Weak Chomsky Normal Form (WCNF) to improve performance. ### Experimental Results The experimental results show that the optimized matrix - based CFL - r algorithm significantly outperforms existing tools such as POCR, Graspan, and Gigascale in multiple benchmarks, specifically in terms of speed improvement and memory usage efficiency when processing large - scale graphs. ### Conclusions and Future Work Through these optimizations, the author demonstrates the superior performance of the optimized matrix - based CFL - r algorithm on various problems. Future work will include complexity analysis and the generalization of these optimizations to other algorithms.