Abstract:Background: We propose a decision-referral approach for integrating artificial intelligence (AI) into the breast-cancer screening pathway, whereby the algorithm makes predictions on the basis of its quantification of uncertainty. Algorithmic assessments with high certainty are done automatically, whereas assessments with lower certainty are referred to the radiologist. This two-part AI system can triage normal mammography exams and provide post-hoc cancer detection to maintain a high degree of sensitivity. This study aimed to evaluate the performance of this AI system on sensitivity and specificity when used either as a standalone system or within a decision-referral approach, compared with the original radiologist decision. Methods: We used a retrospective dataset consisting of 1 193 197 full-field, digital mammography studies carried out between Jan 1, 2007, and Dec 31, 2020, from eight screening sites participating in the German national breast-cancer screening programme. We derived an internal-test dataset from six screening sites (1670 screen-detected cancers and 19 997 normal mammography exams), and an external-test dataset of breast cancer screening exams (2793 screen-detected cancers and 80 058 normal exams) from two additional screening sites to evaluate the performance of an AI algorithm on sensitivity and specificity when used either as a standalone system or within a decision-referral approach, compared with the original individual radiologist decision at the point-of-screen reading ahead of the consensus conference. Different configurations of the AI algorithm were evaluated. To account for the enrichment of the datasets caused by oversampling cancer cases, weights were applied to reflect the actual distribution of study types in the screening programme. Triaging performance was evaluated as the rate of exams correctly identified as normal. Sensitivity across clinically relevant subgroups, screening sites, and device manufacturers was compared between standalone AI, the radiologist, and decision referral. We present receiver operating characteristic (ROC) curves and area under the ROC (AUROC) to evaluate AI-system performance over its entire operating range. Comparison with radiologists and subgroup analysis was based on sensitivity and specificity at clinically relevant configurations. Findings: The exemplary configuration of the AI system in standalone mode achieved a sensitivity of 84·2% (95% CI 82·4-85·8) and a specificity of 89·5% (89·0-89·9) on internal-test data, and a sensitivity of 84·6% (83·3-85·9) and a specificity of 91·3% (91·1-91·5) on external-test data, but was less accurate than the average unaided radiologist. By contrast, the simulated decision-referral approach significantly improved upon radiologist sensitivity by 2·6 percentage points and specificity by 1·0 percentage points, corresponding to a triaging performance at 63·0% on the external dataset; the AUROC was 0·982 (95% CI 0·978-0·986) on the subset of studies assessed by AI, surpassing radiologist performance. The decision-referral approach also yielded significant increases in sensitivity for a number of clinically relevant subgroups, including subgroups of small lesion sizes and invasive carcinomas. Sensitivity of the decision-referral approach was consistent across the eight included screening sites and three device manufacturers. Interpretation: The decision-referral approach leverages the strengths of both the radiologist and AI, demonstrating improvements in sensitivity and specificity surpassing that of the individual radiologist and of the standalone AI system. This approach has the potential to improve the screening accuracy of radiologists, is adaptive to the requirements of screening, and could allow for the reduction of workload ahead of the consensus conference, without discarding the generalised knowledge of radiologists. Funding: Vara.

Evaluation of Combined Artificial Intelligence and Radiologist Assessment to Interpret Screening Mammograms

International evaluation of an AI system for breast cancer screening

AI-integrated Screening to Replace Double Reading of Mammograms: A Population-wide Accuracy and Feasibility Study

Impact of Different Mammography Systems on Artificial Intelligence Performance in Breast Cancer Screening

Comparison of AI-integrated pathways with human-AI interaction in population mammographic screening for breast cancer

Protocol for evaluating the fitness for purpose of an artificial intelligence product for radiology reporting in the BreastScreen New South Wales breast cancer screening programme

Use of artificial intelligence for image analysis in breast cancer screening programmes: systematic review of test accuracy

Abstract A088: Utilizing Machine Learning Techniques to Investigate Mammograms for Breast Cancer Detection

Comparison of AI-integrated pathways with human-AI interaction for population mammographic screening

Combining the strengths of radiologists and AI for breast cancer screening: a retrospective analysis

External Validation of a Commercial Artificial Intelligence Algorithm on a Diverse Population for Detection of False Negative Breast Cancers

AI for interpreting screening mammograms: implications for missed cancer in double reading practices and challenging-to-locate lesions

Artificial intelligence for breast cancer screening: Opportunity or hype?

Artificial intelligence in mammography: a systematic review of the external validation

Evaluation of the Combination of Artificial Intelligence and Radiologist Assessments to Interpret Malignant Architectural Distortion on Mammography

Can we reduce the workload of mammographic screening by automatic identification of normal exams with artificial intelligence? A feasibility study

Artificial Intelligence System Reduces False-Positive Findings in the Interpretation of Breast Ultrasound Exams

Can artificial intelligence replace ultrasound as a complementary tool to mammogram for the diagnosis of the breast cancer?

Retrospective analysis of the effect on interval cancer rate of adding an artificial intelligence algorithm to the reading process for two-dimensional full-field digital mammography

AI integration improves breast cancer screening in a real-world, retrospective cohort study

The first 10,000 mammography exams performed as part of the “Description and interpretation of mammography data using artificial intelligence” service.