Abstract:Code review is an important practice in software development. One of its main objectives is for the assurance of code quality. For this purpose, the efficacy of code review is subject to the credibility of reviewers, i.e, reviewers who have demonstrated strong evidence of previously making quality-enhancing comments are more credible than those who have not. Code reviewer recommendation (CRR) is designed to assist in recommending suitable reviewers for a specific objective and, in this context, assurance of code quality. Its performance is susceptible to the relevance of its training dataset to this objective, composed of all reviewers’ historical review comments, which, however, often contains a plethora of comments that are irrelevant to the enhancement of code quality. Furthermore, recommendation accuracy has been adopted as the sole metric to evaluate a recommender’s performance, which is inadequate as it does not take reviewers’ relevant credibility into consideration. These two issues form the ground truth problem in CRR as they both originate from the relevance of dataset used to train and evaluate CRR algorithms. To tackle this problem, we first propose the concept of Quality-Enhancing Review Comments (QERC), which includes three types of comments - change-triggering inline comments, informative general comments, and approve-to-merge comments. We then devise a set of algorithms and procedures to obtain a distilled dataset by applying QERC to the original dataset. We finally introduce a new metric – reviewer’s credibility for quality enhancement (RCQE) – as a complementary metric to recommendation accuracy for evaluating the performance of recommenders. To validate the proposed QERC-based approach to CRR, we conduct empirical studies using real data from seven projects containing over 82K pull requests and 346K review comments. Results show that: (a) QERC can effectively address the ground truth problem by distilling quality-enhancing comments from the dataset containing original code reviews, (b) QERC can assist recommenders in finding highly credible reviewers at a slight cost of recommendation accuracy, and (c) even “wrong” recommendations using the distilled dataset are likely to be more credible than those using the original dataset.

Towards Confidence with Capture-recapture Estimation

An Empirical Study on Independence-Driven Data Selection for Improving Capture-Recapture Estimation

A Method to Evaluate Estimates Produced by the Capture-Recapture Model.

Does Detecting More Defects Contribute to Better Estimation? an Empirical Investigation to the Capture-Recapture Method.

An Empirical Evaluation of Capture-Recapture Estimators in Software Inspection

Conditional Tail-Related Risk Estimation Using Composite Asymmetric Least Squares and Empirical Likelihood

Dependence-Robust Confidence Intervals for Capture-Recapture Surveys

The adoption of capture-recapture in software engineering: a systematic literature review

Quantitative Estimation with Lift-Off Effect in Conductive Structure

Study of Lift-Off Invariance Transformation Method for Quantitative Defect Estimation in Eddy Current Testing

Quantitative Nondestructive Estimation of Deep Defects in Conductive Structures

Novel Robust Least-Squares Estimator for Linear Dynamic Data Reconciliation

Estimation of Dependability Measures and Parameter Sensitivities of a Consecutive- -out-of- :F Repairable System with -Step

A New Method for Measurement Error Covariance Estimation

A Compound Test With High Confidence LevelFor Gross Error Detection

Tailoring Capture-Recapture Methods to Estimate Registry-Based Case Counts Based on Error-Prone Diagnostic Signals

Distilling Quality Enhancing Comments from Code Reviews to Underpin Reviewer Recommendation

Model Quality Aware RANSAC: A Robust Camera Motion Estimator

A Cumulant-based Method for Credibility Assessment of Power System State Estimation

A Novel Severity Calibration Algorithm for Defect Detection by Constructing Maps

Reliability estimation of corroded RC structures based on spatial variability using experimental evidence, probabilistic analysis and finite element method