A New Method of Identifying Pathologic Complete Response After Neoadjuvant Chemotherapy for Breast Cancer Patients Using a Population-Based Electronic Medical Record System

Guosong Wu,Cheligeer Cheligeer,Anne-Marie Brisson,May Lynn Quan,Winson Y. Cheung,Darren Brenner,Sasha Lupichuk,Carolin Teman,Robert Barkev Basmadjian,Brittany Popwich,Yuan Xu
DOI: https://doi.org/10.1245/s10434-022-12955-6
IF: 4.339
2022-12-21
Annals of Surgical Oncology
Abstract:BackgroundAccurate identification of pathologic complete response (pCR) from population-based electronic narrative data in a timely and cost-efficient manner is critical. This study aimed to derive and validate a set of natural language processing (NLP)-based machine-learning algorithms to capture pCR from surgical pathology reports of breast cancer patients who underwent neoadjuvant chemotherapy (NAC).MethodsThis retrospective cohort study included all invasive breast cancer patients who underwent NAC and subsequent curative-intent surgery during their admission at all four tertiary acute care hospitals in Calgary, Alberta, Canada, between 1 January 2010 and 31 December 2017. Surgical pathology reports were extracted and processed with NLP. Decision tree classifiers were constructed and validated against chart review results. Machine-learning algorithms were evaluated with a performance matrix including sensitivity, specificity, positive predictive value (PPV), negative predictive value [NPV], accuracy, area under the receiver operating characteristic curve [AUC], and F1 score.ResultsThe study included 351 female patients. Of these patients, 102 (29%) achieved pCR after NAC. The high-sensitivity model achieved a sensitivity of 90.5% (95% confidence interval [CI], 69.6–98.9%), a PPV of 76% (95% CI, 59.6–87.2), an accuracy of 88.6% (95% CI, 78.7–94.9%), an AUC of 0.891 (95% CI, 0.795–0.987), and an F1 score of 82.61. The high-PPV algorithm reached a sensitivity of 85.7% (95% CI, 63.7–97%), a PPV of 81.8% (95% CI, 63.4–92.1%), an accuracy of 90% (95% CI, 80.5–95.9%), an AUC of 0.888 (95% CI, 0.790–0.985), and an F1 score of 83.72. The high-F1 score algorithm obtained a performance equivalent to that of the high-PPV algorithm.ConclusionThe developed algorithms demonstrated excellent accuracy in identifying pCR from surgical pathology reports of breast cancer patients who received NAC treatment.
oncology,surgery
What problem does this paper attempt to address?