Abstract:Automatic verification of concurrent programs faces state explosion due to the exponential possible interleavings of its sequential components coupled with large or infinite state spaces. An alternative is deductive verification, where given a candidate invariant, we establish inductive invariance and show that any state satisfying the invariant is also safe. However, learning (inductive) program invariants is difficult. To this end, we propose a data-driven procedure to synthesize program invariants, where it is assumed that the program invariant is an expression that characterizes a (hopefully tight) over-approximation of the reachable program states. The main ideas of our approach are: (1) We treat a candidate invariant as a classifier separating states observed in (sampled) program traces from those speculated to be unreachable. (2) We develop an enumerative, template-free approach to learn such classifiers from positive and negative examples. At its core, our enumerative approach employs decision trees to generate expressions that do not over-fit to the observed states (and thus generalize). (3) We employ a runtime framework to monitor program executions that may refute the candidate invariant; every refutation triggers a revision of the candidate invariant. Our runtime framework can be viewed as an instance of statistical model checking, which gives us probabilistic guarantees on the candidate invariant. We also show that such in some cases, our counterexample-guided inductive synthesis approach converges (in probability) to an overapproximation of the reachable set of states. Our experimental results show that our framework excels in learning useful invariants using only a fraction of the set of reachable states for a wide variety of concurrent programs.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the state - explosion problem in concurrent program verification, especially due to the exponential possible interleaved executions between sequential components of concurrent programs and their huge or infinite state spaces. Specifically, the paper proposes a data - driven method to automatically generate program invariants, which can be used to prove the safety and correctness of programs. #### Main problems: 1. **State - explosion problem**: Due to the complex interleaved executions and huge state spaces of concurrent programs, automatic verification becomes very difficult. 2. **Learning inductive invariants**: Existing methods face challenges in learning inductive invariants, especially how to efficiently generate these invariants without predefined templates. #### Core contributions of the paper: 1. **Data - driven template - free invariant generation**: The paper proposes a data - driven method to learn program invariants by analyzing program execution traces without relying on predefined templates. 2. **Classifier to separate known and speculated states**: Consider the candidate invariant as a classifier to distinguish between the observed reachable states and the speculated unreachable states. 3. **Enumeration template - free learning**: Adopt an enumeration, template - free method to learn classifiers from positive and negative samples. The core is to use decision trees to generate expressions to avoid overfitting. 4. **Runtime framework monitoring**: Introduce a runtime framework to monitor program execution to detect and correct incorrect candidate invariants to ensure their validity. 5. **Statistical model checking**: Use statistical model checking to provide probability guarantees to ensure the correctness of the learned invariants. ### Formula representation To understand the above content more clearly, the following are the formula representations of some key concepts involved in the paper: - **Set of reachable states**: \[ \text{Reach}(P, \text{Init})=\{s\mid\exists\sigma\in\text{Traces}(P), s_0\in\text{Init}, \forall i < |\sigma|, (s_i, s_{i+1})\in T\} \] where \(P\) is the program, \(\text{Init}\) is the set of initial states, and \(T\) is the state - transition relation. - **Properties of invariants**: - **Safety**: \[ \forall s\in\text{Reach}(P, \text{Init}), s\models I \] - **Compactness (probability)**: \[ \Pr[(s\notin\text{Reach}(P))\land(s\models I)] < \epsilon \] Through these formulas, we can more accurately describe the algorithm proposed in the paper and its goals.

Data-Driven Template-Free Invariant Generation

Learning Likely Invariants to Explain Why a Program Fails

Demystifying Template-Based Invariant Generation for Bit-Vector Programs

Probabilistic Program Verification Via Inductive Synthesis of Inductive Invariants.

Data-driven invariant learning for probabilistic programs

Invariant Generation through Strategy Iteration in Succinctly Represented Control Flow Graphs

Using Dynamic Analysis to Generate Disjunctive Invariants

Invariant Detection with Program Verification Tools

A Novel Data-Driven Approach for Generating Verified Loop Invariants *

A novel data-driven approach on inferring loop invariants for C programs

Synthesizing Short-Circuiting Validation of Data Structure Invariants

Structural Invariants for Parametric Verification of Systems with Almost Linear Architectures

Probabilistic Conditional System Invariant Generation with Bayesian Inference

Generalized Homogeneous Polynomials for Efficient Template-Based Nonlinear Invariant Synthesis

Synthesizing invariants by solving solvable loops

Syndicate: Synergistic Synthesis of Ranking Function and Invariants for Termination Analysis

Inferring Inductive Invariants from Phase Structures

Generating Loop Invariants for Program Verification by Transformation

Proofs as Relational Invariants of Synthesized Execution Grammars

Beyond the elementary representations of program invariants over algebraic data types

Encoding Inductive Invariants As Barrier Certificates: Synthesis Via Difference-of-convex Programming.