Evaluating Learning-to-Rank Models for Prioritizing Code Review Requests Using Process Simulation

Lanxin Yang,Bohan Liu,Junyu Jia,Junming Xue,Jinwei Xu,Alberto Bacchelli,He Zhang
DOI: https://doi.org/10.1109/saner56733.2023.00050
2023-01-01
Abstract:In large-scale, active software projects, one of the main challenges with code review is prioritizing the many Code Review Requests (CRRs) these projects receive. Prior studies have developed many Learning-to-Rank (LtR) models in support of prioritizing CRRs and adopted rich evaluation metrics to compare their performances. However, the evaluation was performed before observing the complex interactions between CRRs and reviewers, activities and activities in real-world code reviews. Such a pre-review evaluation provides few indications about how effective LtR models contribute to code reviews. This study aims to perform a post-review evaluation on LtR models for prioritizing CRRs. To establish the evaluation environment, we employ Discrete-Event Simulation (DES) paradigm-based Software Process Simulation Modeling (SPSM) to simulate real-world code review processes, together with three customized evaluation metrics. We develop seven LtR models and use the historical review orders of CRRs as baselines for evaluation. The results indicate that employing LtR can effectively help to accelerate the completion of reviewing CRRs and the delivery of qualified code changes. Among the seven LtR models, LambdaMART and AdaRank are particularly beneficial for accelerating completion and delivery, respectively. This study empirically demonstrates the effectiveness of using DES-based SPSM for simulating code review processes, the benefits of using LtR for prioritizing CRRs, and the specific advantages of several LtR models. This study provides new ideas for software organizations that seek to evaluate LtR models and other artificial intelligence-powered software techniques.Data&materials: https://figshare.com/s/a033e99cd2a61e64c8bc.
What problem does this paper attempt to address?