Abstract:Breast cancer remains a global challenge, causing over 600,000 deaths in 2018 (ref. <a href="/articles/s41591-020-01174-9#ref-CR1">1</a>). To achieve earlier cancer detection, health organizations worldwide recommend screening mammography, which is estimated to decrease breast cancer mortality by 20–40% (refs. <a href="/articles/s41591-020-01174-9#ref-CR2">2</a>,<a href="/articles/s41591-020-01174-9#ref-CR3">3</a>). Despite the clear value of screening mammography, significant false positive and false negative rates along with non-uniformities in expert reader availability leave opportunities for improving quality and access<a href="/articles/s41591-020-01174-9#ref-CR4">4</a>,<a href="/articles/s41591-020-01174-9#ref-CR5">5</a>. To address these limitations, there has been much recent interest in applying deep learning to mammography<a href="#ref-CR6">6</a>,<a href="#ref-CR7">7</a>,<a href="#ref-CR8">8</a>,<a href="#ref-CR9">9</a>,<a href="#ref-CR10">10</a>,<a href="#ref-CR11">11</a>,<a href="#ref-CR12">12</a>,<a href="#ref-CR13">13</a>,<a href="#ref-CR14">14</a>,<a href="#ref-CR15">15</a>,<a href="#ref-CR16">16</a>,<a href="#ref-CR17">17</a>,<a href="/articles/s41591-020-01174-9#ref-CR18">18</a>, and these efforts have highlighted two key difficulties: obtaining large amounts of annotated training data and ensuring generalization across populations, acquisition equipment and modalities. Here we present an annotation-efficient deep learning approach that (1) achieves state-of-the-art performance in mammogram classification, (2) successfully extends to digital breast tomosynthesis (DBT; '3D mammography'), (3) detects cancers in clinically negative prior mammograms of patients with cancer, (4) generalizes well to a population with low screening rates and (5) outperforms five out of five full-time breast-imaging specialists with an average increase in sensitivity of 14%. By creating new 'maximum suspicion projection' (MSP) images from DBT data, our progressively trained, multiple-instance learning approach effectively trains on DBT exams using only breast-level labels while maintaining localization-based interpretability. Altogether, our results demonstrate promise towards software that can improve the accuracy of and access to screening mammography worldwide.

Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography

Mammo-CLIP: Leveraging Contrastive Language-Image Pre-training (CLIP) for Enhanced Breast Cancer Diagnosis with Multi-view Mammography

Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography

RadCLIP: Enhancing Radiologic Image Analysis through Contrastive Language-Image Pre-training

BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs

ViKL: A Mammography Interpretation Framework via Multimodal Aggregation of Visual-knowledge-linguistic Features

CXR-CLIP: Toward Large Scale Chest X-ray Language-Image Pre-training

PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents

Language Augmentation in CLIP for Improved Anatomy Detection on Multi-modal Medical Images

Panoptic Segmentation of Mammograms with Text-To-Image Diffusion Model

Robust breast cancer detection in mammography and digital breast tomosynthesis using an annotation-efficient deep learning approach

Knowledge-grounded Adaptation Strategy for Vision-language Models: Building Unique Case-set for Screening Mammograms for Residents Training

Robust breast cancer detection in mammography and digital breast tomosynthesis using annotation-efficient deep learning approach

Applying Deep Learning Methods for Mammography Analysis and Breast Cancer Detection

FACMIC: Federated Adaptative CLIP Model for Medical Image Classification

EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis

VisionCLIP: An Med-AIGC based Ethical Language-Image Foundation Model for Generalizable Retina Image Analysis

MammoDG: Generalisable Deep Learning Breaks the Limits of Cross-Domain Multi-Center Breast Cancer Screening

Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography

CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans and Radiology Reports for Full-Body Scenarios

Medical Vision-Language Pre-Training for Brain Abnormalities