Criteria for Identifying Radiologists with Acceptable Screening Mammography Interpretive Performance based on Multiple Performance Measures
D. Miglioretti,L. Ichikawa,Robert A. Smith,W. Lawrence,Bassett,S. Feig,B. Monsees,J. Parikh,D. Robert,Rosenberg,E. Sickles,P. Carney
Abstract:Objective— Using a combination of performance measures, we updated previously proposed criteria for identifying physicians whose performance interpreting screening mammograms may indicate suboptimal interpretation skills. Materials and Methods— In this Institutional Review Board-approved, HIPAA-compliant study, six expert breast imagers used a method based on the Angoff approach to update criteria for acceptable mammography performance on the basis of combined performance measures: (Group 1) sensitivity and specificity, for facilities with complete capture of false-negative cancers; and (Group 2) cancer detection rate (CDR), recall rate, and positive predictive value of a recall (PPV 1 ), for facilities that cannot capture false negatives, but have reliable cancer follow-up information for positive mammograms. Decisions were informed by normative data from the Breast Cancer Surveillance Consortium (BCSC). Results— Updated, combined ranges for acceptable sensitivity and specificity of screening mammography are: (1) sensitivity ≥80% and specificity ≥85% or (2) sensitivity 75–79% and specificity 88–97%. Updated ranges for CDR, recall rate, and PPV 1 are: (1) CDR ≥6/1000, recall rate 3–20%, and any PPV 1 ; (2) CDR 4–6/1000, recall rate 3–15%, and PPV 1 ≥3%; or (3) CDR 2.5–4/1000, recall rate 5–12%, and PPV 1 3–8%. Using the original criteria, 51% of BCSC radiologists had acceptable sensitivity and specificity; 40% had acceptable CDR, recall rate, and PPV 1 . Using the combined criteria, 69% had acceptable sensitivity and specificity and 62% had acceptable CDR, recall rate, and PPV 1 . Conclusion— The combined criteria improve previous criteria by considering the inter-relationships of multiple performance measures and broaden the acceptable performance ranges compared to previous criteria based on individual measures.