Data-Driven Template-Free Invariant Generation

Yuan Xia,Jyotirmoy V. Deshmukh,Mukund Raghothaman,Srivatsan Ravi
2023-12-29
Abstract:Automatic verification of concurrent programs faces state explosion due to the exponential possible interleavings of its sequential components coupled with large or infinite state spaces. An alternative is deductive verification, where given a candidate invariant, we establish inductive invariance and show that any state satisfying the invariant is also safe. However, learning (inductive) program invariants is difficult. To this end, we propose a data-driven procedure to synthesize program invariants, where it is assumed that the program invariant is an expression that characterizes a (hopefully tight) over-approximation of the reachable program states. The main ideas of our approach are: (1) We treat a candidate invariant as a classifier separating states observed in (sampled) program traces from those speculated to be unreachable. (2) We develop an enumerative, template-free approach to learn such classifiers from positive and negative examples. At its core, our enumerative approach employs decision trees to generate expressions that do not over-fit to the observed states (and thus generalize). (3) We employ a runtime framework to monitor program executions that may refute the candidate invariant; every refutation triggers a revision of the candidate invariant. Our runtime framework can be viewed as an instance of statistical model checking, which gives us probabilistic guarantees on the candidate invariant. We also show that such in some cases, our counterexample-guided inductive synthesis approach converges (in probability) to an overapproximation of the reachable set of states. Our experimental results show that our framework excels in learning useful invariants using only a fraction of the set of reachable states for a wide variety of concurrent programs.
Programming Languages,Systems and Control
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the state - explosion problem in concurrent program verification, especially due to the exponential possible interleaved executions between sequential components of concurrent programs and their huge or infinite state spaces. Specifically, the paper proposes a data - driven method to automatically generate program invariants, which can be used to prove the safety and correctness of programs. #### Main problems: 1. **State - explosion problem**: Due to the complex interleaved executions and huge state spaces of concurrent programs, automatic verification becomes very difficult. 2. **Learning inductive invariants**: Existing methods face challenges in learning inductive invariants, especially how to efficiently generate these invariants without predefined templates. #### Core contributions of the paper: 1. **Data - driven template - free invariant generation**: The paper proposes a data - driven method to learn program invariants by analyzing program execution traces without relying on predefined templates. 2. **Classifier to separate known and speculated states**: Consider the candidate invariant as a classifier to distinguish between the observed reachable states and the speculated unreachable states. 3. **Enumeration template - free learning**: Adopt an enumeration, template - free method to learn classifiers from positive and negative samples. The core is to use decision trees to generate expressions to avoid overfitting. 4. **Runtime framework monitoring**: Introduce a runtime framework to monitor program execution to detect and correct incorrect candidate invariants to ensure their validity. 5. **Statistical model checking**: Use statistical model checking to provide probability guarantees to ensure the correctness of the learned invariants. ### Formula representation To understand the above content more clearly, the following are the formula representations of some key concepts involved in the paper: - **Set of reachable states**: \[ \text{Reach}(P, \text{Init})=\{s\mid\exists\sigma\in\text{Traces}(P), s_0\in\text{Init}, \forall i < |\sigma|, (s_i, s_{i+1})\in T\} \] where \(P\) is the program, \(\text{Init}\) is the set of initial states, and \(T\) is the state - transition relation. - **Properties of invariants**: - **Safety**: \[ \forall s\in\text{Reach}(P, \text{Init}), s\models I \] - **Compactness (probability)**: \[ \Pr[(s\notin\text{Reach}(P))\land(s\models I)] < \epsilon \] Through these formulas, we can more accurately describe the algorithm proposed in the paper and its goals.