Diagnosis of breast cancer molecular subtypes using machine learning models on unimodal and multimodal datasets

Samta Rani,Tanvir Ahmad,Sarfaraz Masood,Chandni Saxena
DOI: https://doi.org/10.1007/s00521-023-09005-x
2023-09-19
Neural Computing and Applications
Abstract:Breast cancer is a significant global health concern, with millions of cases and deaths each year. Accurate diagnosis is critical for timely treatment and medication. Machine learning techniques have shown promising results in detecting breast cancer. Previous studies have primarily used single-modality data for breast cancer diagnosis. Hence, this work aims to mobilize the benefits of multimodal data over unimodality samples. This study proposes a custom deep learning-based model pipeline that works over this multimodal data. This work has been separated into three phases. Phase 1 and Phase 2 under the unimodal category examine gene expression data and histopathological images separately. The Cancer Genome Atlas makes these datasets available. In Phase 3, the proposed pipeline operates on both data types’ samples for each patient in the multimodal category. This study investigates how data pre-processing (cleaning, transformation, reduction) and cascaded filtering affect model performance. Precision, recall, f1-score, and accuracy assessed the models, whereas L2 regularization, exponentially weighted moving average, and transfer learning minimized over-fitting. A custom deep neural network and support vector machine obtained 86% accuracy in Phase 1, whereas the VGG16 model reached 80.21% accuracy in Phase 2. In Phase 3, the curated multimodal dataset was applied to a custom deep learning pipeline (VGG16 backbone with hyper-tuned machine learning models as head classifiers) to achieve 94% accuracy, demonstrating the importance of multimodal data over unimodal in breast cancer subtype classification. These findings highlight the importance of multimodal data for breast cancer diagnosis and subtype prediction.
computer science, artificial intelligence
What problem does this paper attempt to address?