Abstract:Several techniques have been proposed to automate code review. Early support consisted in recommending the most suited reviewer for a given change or in prioritizing the review tasks. With the advent of deep learning in software engineering, the level of automation has been pushed to new heights, with approaches able to provide feedback on source code in natural language as a human reviewer would do. Also, recent work documented open source projects adopting Large Language Models (LLMs) as co-reviewers. Although the research in this field is very active, little is known about the actual impact of including automatically generated code reviews in the code review process. While there are many aspects worth investigating, in this work we focus on three of them: (i) review quality, i.e., the reviewer's ability to identify issues in the code; (ii) review cost, i.e., the time spent reviewing the code; and (iii) reviewer's confidence, i.e., how confident is the reviewer about the provided feedback. We run a controlled experiment with 29 experts who reviewed different programs with/without the support of an automatically generated code review. During the experiment we monitored the reviewers' activities, for over 50 hours of recorded code reviews. We show that reviewers consider valid most of the issues automatically identified by the LLM and that the availability of an automated review as a starting point strongly influences their behavior: Reviewers tend to focus on the code locations indicated by the LLM rather than searching for additional issues in other parts of the code. The reviewers who started from an automated review identified a higher number of low-severity issues while, however, not identifying more high-severity issues as compared to a completely manual process. Finally, the automated support did not result in saved time and did not increase the reviewers' confidence.

DeepReview: Automatic Code Review Using Deep Multi-instance Learning

Convolutional Neural Networks Based Multi-task Deep Learning for Movie Review Classification

Who Should Review This Change?: Putting Text and File Location Analyses Together for More Accurate Recommendations

Code Reviewer Recommendation in Tencent

Recommending Code Reviewers for Proprietary Software Projects: A Large Scale Study

Deep Learning-based Code Reviews: A Paradigm Shift or a Double-Edged Sword?

Automating Code Review Activities by Large-Scale Pre-training

Early Prediction of Merged Code Changes to Prioritize Reviewing Tasks

Towards Automating Code Review Activities

CORE: Automating Review Recommendation for Code Changes

Code Review Automation: Strengths and Weaknesses of the State of the Art

A deceptive review detection framework: Combination of coarse and fine-grained features

Using Pre-Trained Models to Boost Code Review Automation

Structuring Meaningful Code Review Automation in Developer Community

Measuring the Effectiveness of Software Code Review Comments

A Unified Review of Deep Learning for Automated Medical Coding

AUGER: Automatically Generating Review Comments with Pre-training Models

Deceptive Reviews Detection Using Deep Learning Techniques

Improving Automated Code Reviews: Learning from Experience

Generation-based Code Review Automation: How Far Are We?