Cheng Zhang,Tobias Kappé,David E. Narváez,Nico Naus
Abstract:Guarded Kleene Algebra with Tests (GKAT) provides a sound and complete framework to reason about trace equivalence between simple imperative programs. However, there are still several notable limitations. First, GKAT is completely agnostic with respect to the meaning of primitives, to keep equivalence decidable. Second, GKAT excludes non-local control flow such as goto, break, and return. To overcome these limitations, we introduce Control-Flow GKAT (CF-GKAT), a system that allows reasoning about programs that include non-local control flow as well as hardcoded values. CF-GKAT is able to soundly and completely verify trace equivalence of a larger class of programs, while preserving the nearly-linear efficiency of GKAT. This makes CF-GKAT suitable for the verification of control-flow manipulating procedures, such as decompilation and goto-elimination. To demonstrate CF-GKAT's abilities, we validated the output of several highly non-trivial program transformations, such as Erosa and Hendren's goto-elimination procedure and the output of Ghidra decompiler. CF-GKAT opens up the application of Kleene Algebra to a wider set of challenges, and provides an important verification tool that can be applied to the field of decompilation and control-flow transformation.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the limitations of the existing Guarded Kleene Algebra with Tests (GKAT) in verifying program control - flow transformations. Specifically, GKAT has the following two main problems:
1. **Ignoring the Meanings of Basic Operations and Tests**: In order to keep the equivalence decidable, GKAT is completely unaware of the specific meanings of basic operations and tests. This means that GKAT cannot verify certain simple equivalences. For example, when the assignment of a variable does not affect the conditional judgment, this assignment can be extracted outside the conditional branch.
2. **Excluding Non - local Control - flow Structures**: GKAT excludes non - local control - flow structures such as `goto`, `break`, and `return`. This makes GKAT unable to express the real control - flow of many real - world programs, although these structures can be simulated by introducing additional variables.
To solve these problems, the paper proposes Control - Flow GKAT (CF - GKAT), an extended version of the GKAT system. CF - GKAT mainly extends the functionality of GKAT in the following two ways:
1. **Introducing Indicator Variables**: Allows the assignment and testing of indicator variables, and these variables can only be compared with hard - coded values. This enables CF - GKAT to verify the effectiveness of program transformation algorithms that contain indicator variables.
2. **Supporting Non - local Control - flow Structures**: Adds support for non - local control - flow structures such as `goto`, `break`, and `return`. To handle the challenges brought by these structures, the paper introduces an intermediate continuation semantics, which can attach a "continuation" to each trace to represent the subsequent execution path.
Through these improvements, CF - GKAT can not only verify the trace equivalence of programs more widely, but also retains the near - linear efficiency of GKAT. This makes CF - GKAT an important tool for verifying complex control - flow transformation processes such as decompilation and `goto` elimination.
### Formula Summary
- **Indicator Variable Test**:
\[
\text{test} \in BExp=\text{false} \mid \text{true} \mid p \in \mathcal{T} \mid x = c \mid e_1 \lor e_2 \mid e_1 \land e_2 \mid \neg e
\]
where \(x = c\) represents whether the indicator variable \(x\) is equal to a certain hard - coded value \(c\).
- **Continuation Semantics**:
\[
\text{continuation} \in \mathcal{C}=\text{acc}(c) \mid \text{brk}(c) \mid \text{ret} \mid \text{jmp}(\ell, c)
\]
where \(c \in \mathcal{C}\) is the value of the indicator variable, and \(\ell \in \mathcal{L}\) is the label.
- **Composition Operation under Continuation Semantics**:
\[
(\mathcal{A} \diamond \mathcal{B})(c)=\begin{cases}
\{w \cdot w' \mid w \cdot \text{acc}(c') \in \mathcal{A}(c), w' \in \mathcal{B}(c')\} & \text{if } w \cdot \text{acc}(c') \in \mathcal{A}(c) \\
\{w \cdot k \mid w \cdot k \in \mathcal{A}(c)\} & \text{if } k \neq \text{acc}(c'
\end{cases}
\]