MAL: Multi-modal Attention Learning for Tumor Diagnosis Based on Bipartite Graph and Multiple Branches

Menglei Jiao,Hong Liu,Jianfang Liu,Hanqiang Ouyang,Xiangdong Wang,Liang Jiang,Huishu Yuan,Yueliang Qian
DOI: https://doi.org/10.1007/978-3-031-16437-8_17
2022-01-01
Abstract:The multi-modal fusion of medical images has been widely used in recent years. Most methods focus on images with a single plane, such as the axial plane with different sequences (T1, T2) or different modalities (CT, MRI), rather than multiple planes with or without cross modalities. Further, most methods focus on segmentation or classification at the image or sequence level rather than the patient level. This paper proposes a general and scalable framework named MAL for the classification of benign and malignant tumors at the patient level based on multi-modal attention learning. A bipartite graph is used to model the correlations between different modalities, and then modal fusion is carried out in feature space by attention learning and multi-branch networks. Thereafter, multi-instance learning is adopted to obtain patient-level diagnostic results by considering different modal pairs of patient images to be bags and the edges in the bipartite graph to be instances. The modal and intra-type similarity losses at the patient level are calculated using the feature similarity matrix to encourage the model to extract high-level semantic features with high correlation. The experimental results confirm the effectiveness of MAL on three datasets with respect to different multi-modal fusion tasks, including axial and sagittal MRI, axial CT and sagittal MRI, and T1 and T2 MRI sequences. And the application of MAL can also significantly improve the diagnostic accuracy and efficiency of doctors.
What problem does this paper attempt to address?