Fine-tuning Large Language Models to Improve Accuracy and Comprehensibility of Automated Code Review

Yongda Yu,Guoping Rong,Haifeng Shen,He Zhang,Dong Shao,Min Wang,Zhao Wei,Yong Xu,Juhong Wang
DOI: https://doi.org/10.1145/3695993
IF: 3.685
2024-01-01
ACM Transactions on Software Engineering and Methodology
Abstract:As code review is a tedious and costly software quality practice, researchers have proposed several machine learning-based methods to automate the process. The primary focus has been on accuracy, that is, how accurately the algorithms are able to detect issues in the code under review. However, human intervention still remains inevitable since results produced by automated code review are not 100% correct. To assist human reviewers in making their final decisions on automatically generated review comments, the comprehensibility of the comments underpinned by accurate localization and relevant explanations for the detected issues with repair suggestions is paramount. However, this has largely been neglected in the existing research. Large language models (LLMs) have the potential to generate code review comments that are more readable and comprehensible by humans thanks to their remarkable processing and reasoning capabilities. However, even mainstream LLMs perform poorly in detecting the presence of code issues because they have not been specifically trained for this binary classification task required in code review. In this paper, we contribute Carllm (Comprehensibility of Automated Code Review using Large Language Models), a novel fine-tuned LLM that has the ability to improve not only the accuracy but, more importantly, the comprehensibility of automated code review, as compared to state-of-the-art pre-trained models and general LLMs.
What problem does this paper attempt to address?