Abstract:The Receiver Operating Characteristic (ROC) curve is a crucial method for evaluating the effectiveness of diagnostic medical indicators and has found extensive applications. However, errors are inevitable in the data acquisition process. Therefore, discussions on error and various methods for improving and handling data have not only become the focus of academic discourse but also hold practical significance. Unlike general statistics, the diversity of error situations, ranges, and impacts in biostatistics often present unique challenges. In practical scenarios, such as drug experiments, limited sample sizes and variations in individual responses to the same drug necessitate the use of error models, data scales, and statistical processing based on historical data, biomedical knowledge, and experimental data. Furthermore, the choice of an appropriate method depends on the specific objectives of the experiment, which is essential for producing compelling conclusions. Importantly, the field of biology has introduced methods to address errors, such as cross-comparison experiments or repeated experiments, and data processing must adapt to changes in experimental designs. This paper presents a statistical approach based on the widely used practice of error reduction through repeated experiments in the context of assessing generic drug consistency. The paper first summarizes the common types of errors encountered in biostatistics and the corresponding analytical, control, and optimization measures. It explores several methods for calculating the Area Under the ROC Curve (AUC) when sampling error is introduced and applies error reduction through repeated experiments. Subsequently, the paper validates the methods under different error scenarios using simulated data, highlighting the suitability of different statistical models and their reasons for selection in cases where the difference between healthy and diseased populations is not substantial. This paper offers valuable insights into handling various types of real-world data to eliminate errors and obtain more accurate statistical conclusions.

Implications of Imbalanced Datasets for Empirical ROC-AUC Estimation in Binary Classification Tasks

An efficient variance estimator of AUC and its applications to binary classification

A Closer Look at AUROC and AUPRC under Class Imbalance

The receiver operating characteristic curve accurately assesses imbalanced datasets

Empirical Comparison of Area under ROC curve (AUC) and Mathew Correlation Coefficient (MCC) for Evaluating Machine Learning Algorithms on Imbalanced Datasets for Binary Classification

Comparison of methods for calculating confidence intervals of AUC in ROC curve considering sampling error

The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets

Multiclass ROC

Comparing multi-class classifier performance by multi-class ROC analysis: A nonparametric approach

Differentially private empirical risk minimization for AUC maximization

Area under the ROC Curve has the Most Consistent Evaluation for Binary Classification

ABROCA Distributions For Algorithmic Bias Assessment: Considerations Around Interpretation

A New Performance Evaluation Method For Imbalanced Data Learning

A Nonparametric Approach for Partial Areas under ROC Curves and Ordinal Dominance Curves

Small-sample precision of ROC-related estimates

A Marginal Model Approach for Analysis of Multi-Reader Multi-Test Receiver Operating Characteristic (ROC) Data

ROC and AUC with a Binary Predictor: a Potentially Misleading Metric

Evaluating classifier performance with highly imbalanced Big Data

An empirical evaluation of sampling methods for the classification of imbalanced data

Learning with Multiclass AUC: Theory and Algorithms

Comparative study of quality estimation of binary classification