Abstract:Breast cancer ranks as the second most prevalent cancer in women, recognized as one of the most dangerous types of cancer, and is on the rise globally. Regular screenings are essential for early-stage treatment. Digital mammography (DM) is the most recognized and widely used technique for breast cancer screening. Contrast-Enhanced Spectral Mammography (CESM or CM) is used in conjunction with DM to detect and identify hidden abnormalities, particularly in dense breast tissue where DM alone might not be as effective. In this work, we explore the effectiveness of each modality (CM, DM, or both) in detecting breast cancer lesions using deep learning methods. We introduce an architecture for detecting and classifying breast cancer lesions in DM and CM images in Craniocaudal (CC) and Mediolateral Oblique (MLO) views. The proposed architecture (JointNet) consists of a convolution module for extracting local features, a transformer module for extracting long-range features, and a feature fusion layer to fuse the local features, global features, and global features weighted based on the local ones. This significantly enhances the accuracy of classifying DM and CM images into normal or abnormal categories and lesion classification into benign or malignant. Using our architecture as a backbone, three lesion classification pipelines are introduced that utilize attention mechanisms focused on lesion shape, texture, and overall breast texture, examining the critical features for effective lesion classification. The results demonstrate that our proposed methods outperform their components in classifying images as normal or abnormal and mitigate the limitations of independently using the transformer module or the convolution module. An ensemble model is also introduced to explore the effect of each modality and each view to increase our baseline architecture's accuracy. The results demonstrate superior performance compared with other similar works. The best performance on DM images was achieved with the semi-automatic AOL Lesion Classification Pipeline, yielding an accuracy of 98.85 %, AUROC of 0.9965, F1-score of 98.85 %, precision of 98.85 %, and specificity of 98.85 %. For CM images, the highest results were obtained using the automatic AOL Lesion Classification Pipeline, with an accuracy of 97.47 %, AUROC of 0.9771, F1-score of 97.34 %, precision of 94.45 %, and specificity of 97.23 %. The semi-automatic ensemble AOL Classification Pipeline provided the best overall performance when using both DM and CM images, with an accuracy of 94.74 %, F1-score of 97.67 %, specificity of 93.75 %, and sensitivity of 95.45 %. Furthermore, we explore the comparative effectiveness of CM and DM images in deep learning models, indicating that while CM images offer clearer insights to the human eye, our model trained on DM images yields better results using Attention on Lesion (AOL) techniques. The research also suggests a multimodal approach using both DM and CM images and ensemble learning could provide more robust classification outcomes.

Enhancing Multimodal Medical Image Classification using Cross-Graph Modal Contrastive Learning

Parkinson's Disease Classification Using Contrastive Graph Cross-View Learning with Multimodal Fusion of SPECT Images and Clinical Features

SIGxCL: A Signal-Image-Graph Cross-Modal Contrastive Learning Framework for CVD Diagnosis Based on Internet of Medical Things

Cross‐Modal Graph Contrastive Learning with Cellular Images

Medical Multimodal Classifiers Under Scarce Data Condition

LABORATORY.

Deep Multimodal Guidance for Medical Image Classification

Gradient modulated contrastive distillation of low-rank multi-modal knowledge for disease diagnosis

Multimodal Multilabel Classification by CLIP

Best of Both Worlds: Multimodal Contrastive Learning with Tabular and Imaging Data

MoCL: Data-driven Molecular Fingerprint via Knowledge-aware Contrastive Learning from Molecular Graph

Contrastive Learning on Multimodal Analysis of Electronic Health Records

Classifier-guided Gradient Modulation for Enhanced Multimodal Learning

Graph-Based Intercategory and Intermodality Network for Multilabel Classification and Melanoma Diagnosis of Skin Lesions in Dermoscopy and Clinical Images

ContIG: Self-supervised Multimodal Contrastive Learning for Medical Imaging with Genetics

Heterogeneous Graph Learning for Multi-modal Medical Data Analysis

Multi-modal classification of breast cancer lesions in Digital Mammography and contrast enhanced spectral mammography images

Triplet attention and dual-pool contrastive learning for clinic-driven multi-label medical image classification

Robust Semi-supervised Multimodal Medical Image Segmentation via Cross Modality Collaboration

MGI: Multimodal Contrastive pre-training of Genomic and Medical Imaging

Cross-Modal Information Maximization for Medical Imaging: CMIM