Comparison of methods for calculating confidence intervals of AUC in ROC curve considering sampling error
Yuxuan She,Jiahao Cui,Xinran Liu
DOI: https://doi.org/10.54254/2755-2721/46/20241317
2024-03-15
Abstract:The Receiver Operating Characteristic (ROC) curve is a crucial method for evaluating the effectiveness of diagnostic medical indicators and has found extensive applications. However, errors are inevitable in the data acquisition process. Therefore, discussions on error and various methods for improving and handling data have not only become the focus of academic discourse but also hold practical significance. Unlike general statistics, the diversity of error situations, ranges, and impacts in biostatistics often present unique challenges. In practical scenarios, such as drug experiments, limited sample sizes and variations in individual responses to the same drug necessitate the use of error models, data scales, and statistical processing based on historical data, biomedical knowledge, and experimental data. Furthermore, the choice of an appropriate method depends on the specific objectives of the experiment, which is essential for producing compelling conclusions. Importantly, the field of biology has introduced methods to address errors, such as cross-comparison experiments or repeated experiments, and data processing must adapt to changes in experimental designs. This paper presents a statistical approach based on the widely used practice of error reduction through repeated experiments in the context of assessing generic drug consistency. The paper first summarizes the common types of errors encountered in biostatistics and the corresponding analytical, control, and optimization measures. It explores several methods for calculating the Area Under the ROC Curve (AUC) when sampling error is introduced and applies error reduction through repeated experiments. Subsequently, the paper validates the methods under different error scenarios using simulated data, highlighting the suitability of different statistical models and their reasons for selection in cases where the difference between healthy and diseased populations is not substantial. This paper offers valuable insights into handling various types of real-world data to eliminate errors and obtain more accurate statistical conclusions.