Abstract:Rust is a programming language that combines memory safety and low-level control, providing C-like performance while guaranteeing the absence of undefined behaviors by default. Rust's growing popularity has prompted research on safe and correct transpiling of existing code-bases to Rust. Existing work falls into two categories: rule-based and large language model (LLM)-based. While rule-based approaches can theoretically produce correct transpilations that maintain input-output equivalence to the original, they often yield unreadable Rust code that uses unsafe subsets of the Rust language. On the other hand, while LLM-based approaches typically produce more readable, maintainable, and safe code, they do not provide any guarantees about correctness. In this work, we present VERT, a tool that can produce readable Rust transpilations with formal guarantees of correctness. VERT's only requirement is that there is Web Assembly compiler for the source language, which is true for most major languages. VERT first uses the Web Assembly compiler to obtain an oracle Rust program. In parallel, VERT uses an LLM to generate a readable candidate Rust program. This candidate is verified against the oracle, and if verification fails, we regenerate a new candidate transpilation until verification succeeds. We evaluate VERT by transpiling a suite of 1,394 programs taken from competitive programming style benchmarks. Combining Anthropic's Claude-2 and VERT increases Rust transpilations passing property-based testing from 31% to 54% and bounded model-checking from 1% to 42% compared to using Claude alone. In addition, we evaluate VERT's ability to generate non-trivial safe Rust on programs taken from real-world C projects that make significant use of pointers. Our results provide insights into the limitations of LLMs to write safe Rust.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of safely and correctly converting existing codebases to the Rust language. Specifically, it attempts to overcome the following two main challenges: 1. **Limitations of the rule - driven approach**: - Although rule - driven conversion methods can theoretically generate correct code and maintain input - output equivalence, they often generate Rust code that is difficult to read and maintain, and this code may use the unsafe subset of the Rust language. 2. **Limitations of the large - language - model (LLM) approach**: - The LLM approach can usually generate more readable, easier - to - maintain, and safer code, but it cannot provide a formal guarantee of code correctness. The LLM may generate seemingly reasonable but actually subtly wrong code, and these errors may be difficult to find and debug. To solve these problems, the author proposes a new tool named VERT. VERT combines the advantages of the rule - driven approach and the LLM and introduces formal verification techniques to ensure that the generated Rust code is not only readable but also functionally equivalent to the original code. ### How VERT works The main workflow of VERT is as follows: 1. **Generate Oracle Rust program**: - Use the Web Assembly compiler of the source language and the rWasm tool to compile the source program into an intermediate representation (Web Assembly), and then convert it to Rust code. This process ensures that the generated Rust code is functionally equivalent to the original code, but the code may be difficult to read. 2. **Generate candidate Rust program**: - Use a large - language model (LLM) to generate a more readable candidate version of the Rust code. 3. **Verify the candidate program**: - Compare and verify the candidate Rust program generated by the LLM with the Oracle Rust program to ensure their functional equivalence. If the verification fails, regenerate a new candidate program until the verification is successful. In this way, VERT can generate high - quality, readable Rust code while ensuring its correctness and safety. ### Experimental results The author conducted extensive experimental evaluations on VERT, and the results show that: - Combining Anthropic's Claude - 2 and VERT increased the proportion of Rust code passing property tests from 31% to 54% and the proportion passing bounded model checking from 1% to 42%. - For C code in complex real - world projects, VERT can generate a relatively simple ownership model suitable for small programs (less than 36 lines of code). In conclusion, VERT significantly improves the success rate and quality of converting code in other languages to Rust code by combining the rule - driven approach, LLM, and formal verification techniques.

VERT: Verified Equivalent Rust Transpilation with Large Language Models as Few-Shot Learners

Towards Translating Real-World Code with LLMs: A Study of Translating to Rust

Leveraging Large Language Models for Automated Proof Synthesis in Rust

Scalable, Validated Code Translation of Entire Projects using Large Language Models

Repository-level Code Translation Benchmark Targeting Rust

Context-aware Code Segmentation for C-to-Rust Translation using Large Language Models

Aeneas: Rust Verification by Functional Translation

AutoVerus: Automated Proof Generation for Rust Code

Surveying the Rust Verification Landscape

AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement

Automated Proof Generation for Rust Code via Self-Evolution

Translating C To Rust: Lessons from a User Study

Towards a Transpiler for C/C++ to Safer Rust

Leveraging Large Language Model to Assist Detecting Rust Code Comment Inconsistency

Code Translation with Compiler Representations

Enhancing Translation Validation of Compiler Transformations with Large Language Models

Verification of a Rust Implementation of Knuth's Dancing Links using ACL2

Refinement Proofs in Rust Using Ghost Locks

A hybrid approach to semi-automated Rust verification

Repository-Level Compositional Code Translation and Validation

A Mixed-Methods Study on the Implications of Unsafe Rust for Interoperation, Encapsulation, and Tooling