Abstract:Abstract Background Incremental value (IncV) evaluates the performance change between an existing risk model and a new model. Different IncV metrics do not always agree with each other. For example, compared with a prescribed-dose model, an ovarian-dose model for predicting acute ovarian failure has a slightly lower area under the receiver operating characteristic curve (AUC) but increases the area under the precision-recall curve (AP) by 48%. This phenomenon of disagreement is not uncommon, and can create confusion when assessing whether the added information improves the model prediction accuracy. Methods In this article, we examine the analytical connections and differences between the AUC IncV ( Δ AUC) and AP IncV ( Δ AP). We also compare the true values of these two IncV metrics in a numerical study. Additionally, as both are semi-proper scoring rules, we compare them with a strictly proper scoring rule: the IncV of the scaled Brier score ( Δ sBrS) in the numerical study. Results We demonstrate that Δ AUC and Δ AP are both weighted averages of the changes (from the existing model to the new one) in separating the risk score distributions between events and non-events. However, Δ AP assigns heavier weights to the changes in higher-risk regions, whereas Δ AUC weights the changes equally. Due to this difference, the two IncV metrics can disagree, and the numerical study shows that their disagreement becomes more pronounced as the event rate decreases. In the numerical study, we also find that Δ AP has a wide range, from negative to positive, but the range of Δ AUC is much smaller. In addition, Δ AP and Δ sBrS are highly consistent, but Δ AUC is negatively correlated with Δ sBrS and Δ AP when the event rate is low. Conclusions Δ AUC treats the wins and losses of a new risk model equally across different risk regions. When neither the existing or new model is the true model, this equality could attenuate a superior performance of the new model for a sub-region. In contrast, Δ AP accentuates the change in the prediction accuracy for higher-risk regions.

Interpretation of the Area Under the ROC Curve for Risk Prediction Models

Alternatives to the ROC Curve AUC and C-statistic for Risk Prediction Models

The Risk Distribution Curve and its Derivatives

A Non-Parametric Method for the Comparison of Partial Areas under ROC Curves and Its Application to Large Health Care Data Sets.

Area under the ROC Curve has the Most Consistent Evaluation for Binary Classification

Receiver operating characteristic curve: overview and practical use for clinicians

ROCnReg: An R Package for Receiver Operating Characteristic Curve Inference with and without Covariate Information

ROC and AUC with a Binary Predictor: a Potentially Misleading Metric

A Clustered Optimal ROC Curve Method for Family-Based Genetic Risk Prediction

Decision Curve Analysis: a Technical Note

A comparison of confidence/credible interval methods for the area under the ROC curve for continuous diagnostic tests with small sample size

Small-sample precision of ROC-related estimates

Interval Estimation for the Difference in Paired Areas under the ROC Curves in the Absence of a Gold Standard Test

Decision Making with Machine Learning and ROC Curves

Nonparametric Covariate Adjustment for Receiver Operating Characteristic Curves

Between a ROC and a Hard Place: Using prevalence plots to understand the likely real world performance of biomarkers in the clinic

Semiparametric empirical likelihood confidence intervals for the difference of areas under two correlated ROC curves under density ratio model

A relationship between the incremental values of area under the ROC curve and of area under the precision-recall curve

Statistical Inference for Box–Cox based Receiver Operating Characteristic Curves

Overview of model validation for survival regression model with competing risks using melanoma study data

The receiver operating characteristic curve accurately assesses imbalanced datasets