Multi-modal Multi-instance Learning Using Weakly Correlated Histopathological Images and Tabular Clinical Information

Hang Li,Fan Yang,Xiaohan Xing,Yu Zhao,Jun Zhang,Yueping Liu,Mengxue Han,Junzhou Huang,Liansheng Wang,Jianhua Yao
DOI: https://doi.org/10.1007/978-3-030-87237-3_51
2021-01-01
Abstract:The fusion of heterogeneous medical data is essential in precision medicine to assist medical experts in treatment decision-making. However, there is often little explicit correlation between data from different modalities such as histopathological images and tabular clinical data. Besides, attention-based multi-instance learning (MIL) often lacks sufficient supervision to assign appropriate attention weights for informative image patches and thus generates a good global representation for the whole image. In this paper, we propose a novel multi-modal multi-instance joint learning method, which fuses different modalities and magnification scales as a cross-modal representation to capture the potential complementary information and recalibrate the features in each modality. Furthermore, we leverage the information from tabular clinical data to optimize the MIL bag representation in the imaging modality. The proposed method is evaluated on a challenging medical task, i.e., lymph node metastasis (LNM) prediction of breast cancer, and achieves the state-of-the-art performance with AUC of 0.8844, outperforming the AUC of 0.7111 using histopathological images or the AUC of 0.8312 using tabular clinical data alone. An open-source implementation of our approach can be found at https://github.com/yfzon/Multimodal-Multi-instance-Learning.
What problem does this paper attempt to address?