BAF-Net: bidirectional attention-aware fluid pyramid feature integrated multimodal fusion network for diagnosis and prognosis
Huiqin Wu,Lihong Peng,Dongyang Du,Hui Xu,Guoyu Lin,Zidong Zhou,Lijun Lu,Wenbing Lv
DOI: https://doi.org/10.1088/1361-6560/ad3cb2
IF: 3.5
2024-04-11
Physics in Medicine and Biology
Abstract:Objective: To go beyond the deficiencies of the three conventional multimodal fusion strategies (i.e., input-, feature- and output-level fusion), we propose a bidirectional attention-aware fluid pyramid feature integrated fusion network (BAF-Net) with cross-modal interactions for multimodal medical image diagnosis and prognosis.
Approach: BAF-Net is composed of two identical branches to preserve the unimodal features and one bidirectional attention-aware distillation stream to progressively assimilate cross-modal complements and to learn supplementary features in both bottom-up and top-down processes. Fluid pyramid connections were adopted to integrate the hierarchical features at different levels of the network, and channel-wise attention modules were exploited to mitigate cross-modal cross-level incompatibility. Furthermore, depth-wise separable convolution was introduced to fuse the cross-modal cross-level features to alleviate the increase in parameters to a great extent. The generalization abilities of BAF-Net were evaluated in terms of two clinical tasks: (1) An in-house PET-CT dataset with 174 patients for differentiation between lung cancer and pulmonary tuberculosis. (2) A public multicenter PET-CT head and neck cancer dataset with 800 patients from nine centers for overall survival prediction.
Main results: On the LC-PTB dataset, improved performance was found in BAF-Net (AUC = 0.7342) compared with input-level fusion model (AUC = 0.6825; p < 0.05), feature-level fusion model (AUC = 0.6968; p = 0.0547), output-level fusion model (AUC = 0.7011; p < 0.05). On the H&N cancer dataset, BAF-Net (C-index = 0.7241) outperformed the input-, feature-, and output-level fusion model, with 2.95%, 3.77%, and 1.52% increments of C-index (p = 0.3336, 0.0479 and 0.2911, respectively). The ablation experiments demonstrated the effectiveness of all the designed modules regarding all the evaluated metrics in both datasets.
Significance: Extensive experiments on two datasets demonstrated better performance and robustness of BAF-Net than three conventional fusion strategies and PET or CT unimodal network in terms of diagnosis and prognosis.
engineering, biomedical,radiology, nuclear medicine & medical imaging