Abstract:Context-free language (CFL) reachability is a standard approach in static analyses, where the analysis question is phrased as a language reachability problem on a graph $G$ wrt a CFL L. While CFLs lack the expressiveness needed for high precision, common formalisms for context-sensitive languages are such that the corresponding reachability problem is undecidable. Are there useful context-sensitive language-reachability models for static analysis? In this paper, we introduce Multiple Context-Free Language (MCFL) reachability as an expressive yet tractable model for static program analysis. MCFLs form an infinite hierarchy of mildly context sensitive languages parameterized by a dimension $d$ and a rank $r$. We show the utility of MCFL reachability by developing a family of MCFLs that approximate interleaved Dyck reachability, a common but undecidable static analysis problem. We show that MCFL reachability be computed in $O(n^{2d+1})$ time on a graph of $n$ nodes when $r=1$, and $O(n^{d(r+1)})$ time when $r>1$. Moreover, we show that when $r=1$, the membership problem has a lower bound of $n^{2d}$ based on the Strong Exponential Time Hypothesis, while reachability for $d=1$ has a lower bound of $n^{3}$ based on the combinatorial Boolean Matrix Multiplication Hypothesis. Thus, for $r=1$, our algorithm is optimal within a factor $n$ for all levels of the hierarchy based on $d$. We implement our MCFL reachability algorithm and evaluate it by underapproximating interleaved Dyck reachability for a standard taint analysis for Android. Used alongside existing overapproximate methods, MCFL reachability discovers all tainted information on 8 out of 11 benchmarks, and confirms $94.3\%$ of the reachable pairs reported by the overapproximation on the remaining 3. To our knowledge, this is the first report of high and provable coverage for this challenging benchmark set.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **How to improve the precision of context - sensitivity and field - sensitivity in static analysis while maintaining solvability?** Specifically, although the traditional reachability analysis method based on context - free languages (CFL) can improve the analysis precision, it lacks sufficient expressive power; and the common context - sensitive language formalisms are too complex, resulting in the undecidability of the corresponding reachability problems. Therefore, researchers face a challenge: Does there exist an effective context - sensitive language reachability model applicable to static analysis? To solve this problem, the author introduced **Multi - Context - Free Language (MCFL) reachability** as a static program analysis model that is both expressive and solvable. MCFL is parameterized by dimension $d$ and rank $r$, forming a mildly context - sensitive language with an infinite hierarchy. As $d$ and $r$ increase, the expressive power of MCFL gradually increases, thus providing controllable analysis precision. ### Specific Problem Description 1. **Limitations of Traditional Methods**: - **Context - Free Language (CFL)**: Although it can increase the analysis precision, it lacks sufficient expressive power. - **Context - Sensitive Language**: It is too complex, resulting in the undecidability of reachability problems. 2. **Research Objectives**: - Look for a natural, efficient (polynomial - time), and practically accurate context - sensitive approximation method to solve the interleaved Dyck reachability problem, which is a common problem in static analysis. 3. **Proposed Solutions**: - Introduce **Multi - Context - Free Language (MCFL)** as a new language model. - Design a family of MCFLs to approximate the interleaved Dyck reachability problem and show its high coverage in practical applications. - Develop a general MCFL reachability algorithm and prove its complexity lower bound. ### Key Contributions 1. **MCFL Reachability as a Program Model**: - Propose MCFL reachability as an expressive and solvable context - sensitive formalism. - By adjusting dimension $d$ and rank $r$, form an infinite - hierarchy model that can gradually improve the expressive power and analysis precision. 2. **MCFL Reachability Algorithm**: - Develop a general algorithm to solve the $d$-MCFL($r$) reachability problem. - Prove the time complexity of the algorithm: - When $r = 1$, the time is $O(\text{poly}(|G|)\cdot\delta\cdot n^{2d})$, where $\delta$ is the maximum degree of the graph. - When $r>1$, the time is $O(\text{poly}(|G|)\cdot n^{d(r + 1)})$. 3. **Complexity Lower Bound**: - Based on fine - grained complexity theory, prove the complexity lower bounds of MCFL reachability and membership problems. - For example, for the case of $r = 1$, prove that the lower bound of $n^{2d}$ is tight. 4. **Experimental Evaluation**: - Implement the MCFL reachability algorithm and evaluate it on a standard benchmark test set. - The results show that MCFL reachability matches the over - approximation results of existing methods in most benchmark tests and can confirm 94.3% of taint information in some cases. Through these contributions, the paper shows the potential of MCFL reachability in static analysis, especially providing a more precise solution in dealing with context and field - sensitivity.

Program Analysis via Multiple Context Free Language Reachability

The Fine-Grained Complexity of CFL Reachability

Optimization of the Context-Free Language Reachability Matrix-Based Algorithm

Pearl: A Multi-Derivation Approach to Efficient CFL-Reachability Solving

Adaptive Call-Site Sensitive Control Flow Integrity

A Constraint-Pattern Based Method for Reachability Determination

Fast Graph Simplification for Interleaved-Dyck Reachability

QReach: A Reachability Analysis Tool for Quantum Markov Chains

Conditional Dyck-CFL Reachability Analysis for Complete and Efficient Library Summarization.

On-The-Fly Static Analysis via Dynamic Bidirected Dyck Reachability

Enforcing Reactive Noninterference with Reachability Analysis

LLMDFA: Analyzing Dataflow in Code with Large Language Models

Verifying correct usage of context-free API protocols

Dynamic Verification of C/C++11 Concurrency over Multi Copy Atomics

Annotating Control-Flow Graphs for Formalized Test Coverage Criteria

Scaling Abstraction Refinement for Program Analyses in Datalog Using Graph Neural Networks

A New Enforcement on Declassification with Reachability Analysis

Fault Localization with Code Coverage Representation Learning

MCPA: Program Analysis as Machine Learning

Chianina: an evolving graph system for flow- and context-sensitive analyses of million lines of C code

Reachability Analysis for Cyber-Physical Systems: Are We There Yet?