Abstract:Multimodal magnetic resonance imaging (MRI) provides complementary information about targets, and the segmentation of multimodal MRI is widely used as an essential preprocessing step for initial diagnosis, stage differentiation, and post-treatment efficacy evaluation in clinical situations. For the main modality or each of the modalities, it is important to enhance the visual information by modeling the connection and effectively fusing the features among them. However, the existing methods for multimodal segmentation have a drawback; they coincidentally drop information of individual modality during the fusion process. Recently, graph learning-based methods have been applied in segmentation, and these methods have achieved considerable improvements by modeling the relationships across feature regions and reasoning using global information. In this paper, we propose a graph learning-based approach to efficiently extract modality-specific features and establish regional correspondence effectively among all modalities. In detail, after projecting features into a graph domain and employing graph convolution to propagate information across all regions for learning global modality-specific features, we propose a mutual information-based graph co-attention module to learn the weight coefficients of one bipartite graph constructed by the fully connected graphs having different modalities in the graph domain and by selectively fusing the node features. Based on the deformation diagram between the spatial-graph space and our proposed graph co-attention module, we present a multimodal prior-guided segmentation framework, which uses two strategies for two clinical situations: Modality-Specific Learning Strategy and Co-Modality Learning Strategy. Besides, the improved Co-Modality Learning Strategy is used with trainable weights in the multi-task loss for the optimization of the proposed framework. We validated our proposed modules and frameworks on two multimodal MRI datasets: our private liver lesion dataset and a public prostate zone dataset. Our experimental results on both datasets prove the superiority of our proposed approaches.

Multimodal Co-Attention Mechanism for One-stage Visual Grounding.

Visual Grounding With Joint Multimodal Representation and Interaction

Visual-Semantic Graph Matching for Visual Grounding

Mutual Information-Based Graph Co-Attention Networks for Multimodal Prior-Guided Magnetic Resonance Imaging Segmentation

End-to-end Visual Grounding Based on Query Text Guidance and Multi-stage Reasoning

Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding

Bear the Query in Mind: Visual Grounding with Query-conditioned Convolution

Desipramine side-effect.

Bridging Modality Gap for Visual Grounding with Effecitve Cross-modal Distillation

Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos

Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision

Cross-Modal Match for Language Conditioned 3D Object Grounding

Multi-scale unsupervised network for infrared and visible image fusion based on joint attention mechanism

HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual Grounding

Weakly Supervised Multimodal Affordance Grounding for Egocentric Images

Multimodal Unified Attention Networks for Vision-and-Language Interactions

A Visual Attention Grounding Neural Model for Multimodal Machine Translation

Multimodal Fusion Method Based on Self-Attention Mechanism

A Fast And Accurate One-Stage Approach To Visual Grounding

Transformer-based Visual Grounding with Cross-modality Interaction

Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models