Abstract:Background: We propose a decision-referral approach for integrating artificial intelligence (AI) into the breast-cancer screening pathway, whereby the algorithm makes predictions on the basis of its quantification of uncertainty. Algorithmic assessments with high certainty are done automatically, whereas assessments with lower certainty are referred to the radiologist. This two-part AI system can triage normal mammography exams and provide post-hoc cancer detection to maintain a high degree of sensitivity. This study aimed to evaluate the performance of this AI system on sensitivity and specificity when used either as a standalone system or within a decision-referral approach, compared with the original radiologist decision. Methods: We used a retrospective dataset consisting of 1 193 197 full-field, digital mammography studies carried out between Jan 1, 2007, and Dec 31, 2020, from eight screening sites participating in the German national breast-cancer screening programme. We derived an internal-test dataset from six screening sites (1670 screen-detected cancers and 19 997 normal mammography exams), and an external-test dataset of breast cancer screening exams (2793 screen-detected cancers and 80 058 normal exams) from two additional screening sites to evaluate the performance of an AI algorithm on sensitivity and specificity when used either as a standalone system or within a decision-referral approach, compared with the original individual radiologist decision at the point-of-screen reading ahead of the consensus conference. Different configurations of the AI algorithm were evaluated. To account for the enrichment of the datasets caused by oversampling cancer cases, weights were applied to reflect the actual distribution of study types in the screening programme. Triaging performance was evaluated as the rate of exams correctly identified as normal. Sensitivity across clinically relevant subgroups, screening sites, and device manufacturers was compared between standalone AI, the radiologist, and decision referral. We present receiver operating characteristic (ROC) curves and area under the ROC (AUROC) to evaluate AI-system performance over its entire operating range. Comparison with radiologists and subgroup analysis was based on sensitivity and specificity at clinically relevant configurations. Findings: The exemplary configuration of the AI system in standalone mode achieved a sensitivity of 84·2% (95% CI 82·4-85·8) and a specificity of 89·5% (89·0-89·9) on internal-test data, and a sensitivity of 84·6% (83·3-85·9) and a specificity of 91·3% (91·1-91·5) on external-test data, but was less accurate than the average unaided radiologist. By contrast, the simulated decision-referral approach significantly improved upon radiologist sensitivity by 2·6 percentage points and specificity by 1·0 percentage points, corresponding to a triaging performance at 63·0% on the external dataset; the AUROC was 0·982 (95% CI 0·978-0·986) on the subset of studies assessed by AI, surpassing radiologist performance. The decision-referral approach also yielded significant increases in sensitivity for a number of clinically relevant subgroups, including subgroups of small lesion sizes and invasive carcinomas. Sensitivity of the decision-referral approach was consistent across the eight included screening sites and three device manufacturers. Interpretation: The decision-referral approach leverages the strengths of both the radiologist and AI, demonstrating improvements in sensitivity and specificity surpassing that of the individual radiologist and of the standalone AI system. This approach has the potential to improve the screening accuracy of radiologists, is adaptive to the requirements of screening, and could allow for the reduction of workload ahead of the consensus conference, without discarding the generalised knowledge of radiologists. Funding: Vara.

Potential Impact of an Artificial Intelligence-based Mammography Triage Algorithm on Performance and Workload in a Population-based Screening Sample

Artificial Intelligence for Reducing Workload in Breast Cancer Screening with Digital Breast Tomosynthesis

Impact of Different Mammography Systems on Artificial Intelligence Performance in Breast Cancer Screening

AI-integrated Screening to Replace Double Reading of Mammograms: A Population-wide Accuracy and Feasibility Study

Can we reduce the workload of mammographic screening by automatic identification of normal exams with artificial intelligence? A feasibility study

Diagnostic capabilities of artificial intelligence as an additional reader in a breast cancer screening program

Early Indicators of the Impact of Using AI in Mammography Screening for Breast Cancer

Protocol for evaluating the fitness for purpose of an artificial intelligence product for radiology reporting in the BreastScreen New South Wales breast cancer screening programme

External Validation of a Commercial Artificial Intelligence Algorithm on a Diverse Population for Detection of False Negative Breast Cancers

Combining the strengths of radiologists and AI for breast cancer screening: a retrospective analysis

Retrospective analysis of the effect on interval cancer rate of adding an artificial intelligence algorithm to the reading process for two-dimensional full-field digital mammography

Evaluation of Combined Artificial Intelligence and Radiologist Assessment to Interpret Screening Mammograms

Comparison of AI-integrated pathways with human-AI interaction in population mammographic screening for breast cancer

Accuracy of an Artificial Intelligence System for Interval Breast Cancer Detection at Screening Mammography

Prospective study of AI-assisted prediction of breast malignancies in physical health examinations: role of off-the-shelf AI software and comparison to radiologist performance

Comparison of AI-integrated pathways with human-AI interaction for population mammographic screening

A deep learning algorithm for reducing false positives in screening mammography

The added value of an artificial intelligence system in assisting radiologists on indeterminate BI-RADS 0 mammograms

Artificial Intelligence for Breast Cancer Detection on Mammography: Factors Related to Cancer Detection

International evaluation of an AI system for breast cancer screening

Use of artificial intelligence for image analysis in breast cancer screening programmes: systematic review of test accuracy