Reliability and Variability of Ki-67 Digital Image Analysis Methods for Clinical Diagnostics in Breast Cancer
Melanie Dawe,Wei Shi,Tian Y Liu,Katherine Lajkosz,Yukiko Shibahara,Nakita E K Gopal,Rokshana Geread,Seyed Mirjahanmardi,Carrie X Wei,Sehrish Butt,Moustafa Abdalla,Sabrina Manolescu,Sheng-Ben Liang,Dianne Chadwick,Michael H A Roehrl,Trevor D McKee,Adewunmi Adeoye,David McCready,April Khademi,Fei-Fei Liu,Anthony Fyles,Susan J Done,Tian Y. Liu,Nakita E.K. Gopal,Carrie X. Wei,Michael H.A. Roehrl,Trevor D. McKee,Susan J. Done
DOI: https://doi.org/10.1016/j.labinv.2024.100341
IF: 5.511
2024-05-01
Laboratory Investigation
Abstract:Ki-67 is a nuclear protein associated with proliferation, and a strong potential biomarker in breast cancer, but is not routinely measured in current clinical management owing to a lack of standardization. Digital image analysis (DIA) is a promising technology that could allow high-throughput analysis and standardization. There is a dearth of data on the clinical reliability as well as intra- and interalgorithmic variability of different DIA methods. In this study, we scored and compared a set of breast cancer cases in which manually counted Ki-67 has already been demonstrated to have prognostic value (n = 278) to 5 DIA methods, namely Aperio ePathology (Lieca Biosystems), Definiens Tissue Studio (Definiens AG), Qupath, an unsupervised immunohistochemical color histogram algorithm, and a deep-learning pipeline piNET. The piNET system achieved high agreement (interclass correlation coefficient: 0.850) and correlation (R = 0.85) with the reference score. The Qupath algorithm exhibited a high degree of reproducibility among all rater instances (interclass correlation coefficient: 0.889). Although piNET performed well against absolute manual counts, none of the tested DIA methods classified common Ki-67 cutoffs with high agreement or reached the clinically relevant Cohen's κ of at least 0.8. The highest agreement achieved was a Cohen's κ statistic of 0.73 for cutoffs 20% and 25% by the piNET system. The main contributors to interalgorithmic variation and poor cutoff characterization included heterogeneous tumor biology, varying algorithm implementation, and setting assignments. It appears that image segmentation is the primary explanation for semiautomated intra-algorithmic variation, which involves significant manual intervention to correct. Automated pipelines, such as piNET, may be crucial in developing robust and reproducible unbiased DIA approaches to accurately quantify Ki-67 for clinical diagnosis in the future.
pathology,medicine, research & experimental