Rectifier: Code Translation with Corrector via LLMs

Xin Yin,Chao Ni,Tien N. Nguyen,Shaohua Wang,Xiaohu Yang

2024-07-10

Abstract:Software migration is garnering increasing attention with the evolution of software and society. Early studies mainly relied on handcrafted translation rules to translate between two languages, the translation process is error-prone and time-consuming. In recent years, researchers have begun to explore the use of pre-trained large language models (LLMs) in code translation. However, code translation is a complex task that LLMs would generate mistakes during code translation, they all produce certain types of errors when performing code translation tasks, which include (1) compilation error, (2) runtime error, (3) functional error, and (4) non-terminating execution. We found that the root causes of these errors are very similar (e.g. failure to import packages, errors in loop boundaries, operator errors, and more). In this paper, we propose a general corrector, namely Rectifier, which is a micro and universal model for repairing translation errors. It learns from errors generated by existing LLMs and can be widely applied to correct errors generated by any LLM. The experimental results on translation tasks between C++, Java, and Python show that our model has effective repair ability, and cross experiments also demonstrate the robustness of our method.

Software Engineering,Artificial Intelligence

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper primarily explores the issues in code translation and proposes a solution. Specifically: 1. **Current State and Challenges of Code Translation**: - Early research mainly relied on manually written translation rules to achieve translation between different programming languages. This method is not only error-prone but also time-consuming. - In recent years, researchers have begun to explore the use of pre-trained large-scale language models (LLMs) for code translation. However, these models still generate various types of errors during translation, including compilation errors, runtime errors, functional errors, and non-terminating executions. 2. **Proposed Issues and Goals**: - The study found that the root causes of these errors are very similar, such as failing to correctly import packages, loop boundary errors, operator errors, etc. - The goal of this paper is to enhance the effectiveness of code translation by introducing a micro-model with efficient error correction capabilities. This model can be universally applied to correct any errors generated by LLMs. 3. **Main Contributions**: - A framework named Rectifier is proposed, which is a micro-model specifically designed to fix translation errors. This model learns from the errors generated by existing LLMs and can be widely applied to correct any errors produced by LLMs. - Extensive experiments were conducted between C++, Java, and Python, showing that the model has effective error correction capabilities. Cross-experiments also demonstrated the robustness of the approach. Through this work, the paper aims to improve the quality of code translation and provide a new perspective for subsequent research.

Rectifier: Code Translation with Corrector via LLMs

Lost in Translation: A Study of Bugs Introduced by Large Language Models while Translating Code

TRANSAGENT: An LLM-Based Multi-Agent System for Code Translation

InterTrans: Leveraging Transitive Intermediate Translations to Enhance LLM-based Code Translation

A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models

Code Translation with Compiler Representations

Escalating LLM-based Code Translation Benchmarking into the Class-level Era

Exploring and Unleashing the Power of Large Language Models in Automated Code Translation

Mitigating the Language Mismatch and Repetition Issues in LLM-based Machine Translation via Model Editing

Unraveling the Potential of Large Language Models in Code Translation: How Far Are We?

Repository-level Code Translation Benchmark Targeting Rust

Scalable, Validated Code Translation of Entire Projects using Large Language Models

CoTran: An LLM-based Code Translator using Reinforcement Learning with Feedback from Compiler and Symbolic Execution

Towards Translating Real-World Code with LLMs: A Study of Translating to Rust

Repository-Level Compositional Code Translation and Validation

Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in Low-Resource Code

hmCodeTrans: Human-Machine Interactive Code Translation

Program Translation via Code Distillation

CREF: An LLM-based Conversational Software Repair Framework for Programming Tutors

Teaching Machines to Code: Smart Contract Translation with LLMs