Abstract:Large language models (LLMs) are increasingly being used for the task of automated code translation, which has important real-world applications. However, most existing approaches use only the source code of a program as an input to an LLM, and do not consider the different kinds of specifications that can be extracted from a program. In this paper, we propose SpecTra, a multi-stage approach that uses a novel self-consistency filter to first generate high-quality static specifications, test cases, and natural language descriptions from a given program, and then uses these along with the source code to improve the quality of LLM-generated translations. We evaluate SpecTra on three code translation tasks - C to Rust, C to Go, and JavaScript to TypeScript - and show that it can enhance the performance of six popular LLMs on these tasks by up to 10 percentage points and a relative improvement of 26\%. Our research suggests that generating high-quality specifications could be a promising and efficient way to improve the performance of LLMs for code translation. We make our code and data available, anonymized for review.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to improve the performance of large - language models (LLMs) in automatic code - translation tasks. Specifically, the authors propose a method named SPECTRA, which aims to enhance the code - translation ability of LLMs by generating multi - modal specifications (including static specifications, input - output test cases, and natural - language descriptions). ### Problem Background 1. **Importance of Code Translation** - With the development of new programming languages (such as Rust, Go, TypeScript), old programming languages (such as C, JavaScript) are gradually being phased out. - Maintaining code written in old languages is costly, prone to security vulnerabilities, and difficult to improve, which is known as "technical debt" and costs the United States more than 1 trillion annually. - Therefore, there is an urgent need to translate old code into modern programming languages. 2. **Limitations of Existing Methods** - Traditional transpilers can ensure functional correctness, but the generated code often does not conform to human habits and is difficult to read and maintain. - LLMs can generate more readable code, but are less accurate than transpilers. ### Goals of SPECTRA SPECTRA aims to combine the functional - correctness advantage of transpilers and the readability advantage of LLMs, and achieve this in the following ways: - **Generate High - Quality Specifications**: Generate static specifications, test cases, and natural - language descriptions from a given program. - **Verify the Validity of Specifications**: Ensure that the generated specifications are self - consistent, that is, the code regenerated according to these specifications is consistent with the original code. - **Guide Code Translation**: Provide these specifications together with the source code to LLMs to improve the quality of code translation. ### Main Contributions 1. Propose a self - consistency - based method for generating static specifications, test cases, and natural - language descriptions of programs. 2. Combine these self - consistent specifications with the source code, propose the SPECTRA method, and use multi - modal specifications to improve the quality of code translation generated by LLMs. 3. Evaluate SPECTRA on three code - translation tasks (C to Rust, C to Go, JavaScript to TypeScript), and the results show that it can significantly improve the performance of six popular LLMs, with a relative improvement of 26% and an absolute improvement of 10%. ### Method Overview The SPECTRA process is divided into three stages: 1. **Specification Generation**: Use LLM to generate static specifications, input - output test cases, and natural - language descriptions. 2. **Specification Verification**: Through self - consistency checks, ensure that the generated specifications are correct. 3. **Specification - Guided Translation**: Provide the verified specifications together with the source code to LLM to generate code in the target language. Through this method, SPECTRA can significantly improve the accuracy of code translation while maintaining code readability.

SpecTra: Enhancing the Code Translation Ability of Language Models by Generating Multi-Modal Specifications

Specification-Driven Code Translation Powered by Large Language Models: How Far Are We?

InterTrans: Leveraging Transitive Intermediate Translations to Enhance LLM-based Code Translation

Scalable, Validated Code Translation of Entire Projects using Large Language Models

nl2spec: Interactively Translating Unstructured Natural Language to Temporal Logics with Large Language Models

CoTran: An LLM-based Code Translator using Reinforcement Learning with Feedback from Compiler and Symbolic Execution

SpecGen: Automated Generation of Formal Program Specifications via Large Language Models

Exploring and Unleashing the Power of Large Language Models in Automated Code Translation

Impact of Large Language Models on Generating Software Specifications

Formal Specifications from Natural Language

Exploring the Impact of the Output Format on the Evaluation of Large Language Models for Code Translation

TRANSAGENT: An LLM-Based Multi-Agent System for Code Translation

Automatically Testing Functional Properties of Code Translation Models

Lost in Translation: A Study of Bugs Introduced by Large Language Models while Translating Code

Towards Translating Real-World Code with LLMs: A Study of Translating to Rust

SpecEval: Evaluating Code Comprehension in Large Language Models via Program Specifications

Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs

SCALE: Synergized Collaboration of Asymmetric Language Translation Engines

Exploring Human-Like Translation Strategy with Large Language Models

Code Translation with Compiler Representations