SpecTra: Enhancing the Code Translation Ability of Language Models by Generating Multi-Modal Specifications

Vikram Nitin,Rahul Krishna,Baishakhi Ray
2024-07-11
Abstract:Large language models (LLMs) are increasingly being used for the task of automated code translation, which has important real-world applications. However, most existing approaches use only the source code of a program as an input to an LLM, and do not consider the different kinds of specifications that can be extracted from a program. In this paper, we propose SpecTra, a multi-stage approach that uses a novel self-consistency filter to first generate high-quality static specifications, test cases, and natural language descriptions from a given program, and then uses these along with the source code to improve the quality of LLM-generated translations. We evaluate SpecTra on three code translation tasks - C to Rust, C to Go, and JavaScript to TypeScript - and show that it can enhance the performance of six popular LLMs on these tasks by up to 10 percentage points and a relative improvement of 26\%. Our research suggests that generating high-quality specifications could be a promising and efficient way to improve the performance of LLMs for code translation. We make our code and data available, anonymized for review.
Software Engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to improve the performance of large - language models (LLMs) in automatic code - translation tasks. Specifically, the authors propose a method named SPECTRA, which aims to enhance the code - translation ability of LLMs by generating multi - modal specifications (including static specifications, input - output test cases, and natural - language descriptions). ### Problem Background 1. **Importance of Code Translation** - With the development of new programming languages (such as Rust, Go, TypeScript), old programming languages (such as C, JavaScript) are gradually being phased out. - Maintaining code written in old languages is costly, prone to security vulnerabilities, and difficult to improve, which is known as "technical debt" and costs the United States more than 1 trillion annually. - Therefore, there is an urgent need to translate old code into modern programming languages. 2. **Limitations of Existing Methods** - Traditional transpilers can ensure functional correctness, but the generated code often does not conform to human habits and is difficult to read and maintain. - LLMs can generate more readable code, but are less accurate than transpilers. ### Goals of SPECTRA SPECTRA aims to combine the functional - correctness advantage of transpilers and the readability advantage of LLMs, and achieve this in the following ways: - **Generate High - Quality Specifications**: Generate static specifications, test cases, and natural - language descriptions from a given program. - **Verify the Validity of Specifications**: Ensure that the generated specifications are self - consistent, that is, the code regenerated according to these specifications is consistent with the original code. - **Guide Code Translation**: Provide these specifications together with the source code to LLMs to improve the quality of code translation. ### Main Contributions 1. Propose a self - consistency - based method for generating static specifications, test cases, and natural - language descriptions of programs. 2. Combine these self - consistent specifications with the source code, propose the SPECTRA method, and use multi - modal specifications to improve the quality of code translation generated by LLMs. 3. Evaluate SPECTRA on three code - translation tasks (C to Rust, C to Go, JavaScript to TypeScript), and the results show that it can significantly improve the performance of six popular LLMs, with a relative improvement of 26% and an absolute improvement of 10%. ### Method Overview The SPECTRA process is divided into three stages: 1. **Specification Generation**: Use LLM to generate static specifications, input - output test cases, and natural - language descriptions. 2. **Specification Verification**: Through self - consistency checks, ensure that the generated specifications are correct. 3. **Specification - Guided Translation**: Provide the verified specifications together with the source code to LLM to generate code in the target language. Through this method, SPECTRA can significantly improve the accuracy of code translation while maintaining code readability.