Enhancing Multimodal Medical Image Classification using Cross-Graph Modal Contrastive Learning

Jun-En Ding,Chien-Chin Hsu,Feng Liu
2024-10-23
Abstract:The classification of medical images is a pivotal aspect of disease diagnosis, often enhanced by deep learning techniques. However, traditional approaches typically focus on unimodal medical image data, neglecting the integration of diverse non-image patient data. This paper proposes a novel Cross-Graph Modal Contrastive Learning (CGMCL) framework for multimodal medical image classification. The model effectively integrates both image and non-image data by constructing cross-modality graphs and leveraging contrastive learning to align multimodal features in a shared latent space. An inter-modality feature scaling module further optimizes the representation learning process by reducing the gap between heterogeneous modalities. The proposed approach is evaluated on two datasets: a Parkinson's disease (PD) dataset and a public melanoma dataset. Results demonstrate that CGMCL outperforms conventional unimodal methods in accuracy, interpretability, and early disease prediction. Additionally, the method shows superior performance in multi-class melanoma classification. The CGMCL framework provides valuable insights into medical image classification while offering improved disease interpretability and predictive capabilities.
Image and Video Processing,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address key issues in multimodal medical image classification. Traditional medical image classification methods typically focus only on single-modal data (such as a single type of medical image) and neglect the integration of different types of non-image patient data (such as electronic health records, blood test results, etc.). This leads to an inability to fully utilize the information provided by multiple data sources in disease diagnosis, limiting the accuracy and interpretability of the classification. To overcome this limitation, the paper proposes a new Cross-Graph Modal Contrastive Learning (CGMCL) framework. This framework effectively integrates image data and non-image data by constructing cross-modal graphs and utilizing contrastive learning techniques, thereby aligning multimodal features in a shared latent space. Additionally, a cross-modal feature scaling module is introduced to further optimize the representation learning process and reduce the gap between heterogeneous modalities. Specifically, the paper evaluates the framework on 2 datasets: a Parkinson's Disease (PD) dataset and a public melanoma dataset. Experimental results show that CGMCL outperforms traditional single-modal methods in terms of accuracy, interpretability, and early disease prediction. Particularly in the multi-class melanoma classification task, CGMCL demonstrates superior performance. This framework not only improves the effectiveness of medical image classification but also provides better interpretability and predictive capabilities for diseases.