One-Versus-Others Attention: Scalable Multimodal Integration for Biomedical Data

Michal Golovanevsky,Eva Schiller,Akira Nair,Eric Han,Ritambhara Singh,Carsten Eickhoff

2024-10-22

Abstract:Multimodal learning models have become increasingly important as they surpass single-modality approaches on diverse tasks ranging from question-answering to autonomous driving. Despite the importance of multimodal learning, existing efforts focus on NLP applications, where the number of modalities is typically less than four (audio, video, text, images). However, data inputs in other domains, such as the medical field, may include X-rays, PET scans, MRIs, genetic screening, clinical notes, and more, creating a need for both efficient and accurate information fusion. Many state-of-the-art models rely on pairwise cross-modal attention, which does not scale well for applications with more than three modalities. For $n$ modalities, computing attention will result in $n \choose 2$ operations, potentially requiring considerable amounts of computational resources. To address this, we propose a new domain-neutral attention mechanism, One-Versus-Others (OvO) attention, that scales linearly with the number of modalities and requires only $n$ attention operations, thus offering a significant reduction in computational complexity compared to existing cross-modal attention algorithms. Using three diverse real-world datasets as well as an additional simulation experiment, we show that our method improves performance compared to popular fusion techniques while decreasing computation costs.

Machine Learning

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to address the efficient integration of multimodal data in the biomedical field. Specifically, existing multimodal models face issues of high computational complexity and poor scalability when handling data from more than 2 modalities. These problems are particularly pronounced in the healthcare sector, where medical data typically includes various modalities such as X-rays, PET scans, MRI, genetic screenings, genomic data, and clinical notes. To tackle this challenge, the authors propose a new attention mechanism called the One-Versus-Others (OvO) attention mechanism. The computational complexity of this mechanism grows linearly with the number of modalities, rather than quadratically as with existing cross-attention or self-attention mechanisms. Through this approach, the OvO attention mechanism significantly reduces computational costs while maintaining or improving performance. ### Main Contributions of the Paper 1. **Proposing the OvO Attention Mechanism**: This is a new method for multimodal integration with linear computational complexity, suitable for handling data with a large number of modalities. 2. **Validating the Method's Effectiveness**: The OvO attention mechanism's effectiveness is demonstrated through three different clinical datasets (MIMIC-IV and CXR data, TADPOLE data, eICU data), showing that it can reduce computational costs while maintaining or improving performance. 3. **Performance Comparison**: Compared to existing early fusion and cross-attention methods, the OvO attention mechanism significantly reduces computational load (by at least 91.98%) and performs better on certain tasks. ### Experimental Results - **MIMIC-IV and CXR Data**: The OvO attention mechanism reduced the computational cost from 67,633,152 FLOPs to 4,227,072 FLOPs on a four-modality dataset, a reduction of 93.75%, while performing well on AUROC and AUPRC metrics. - **TADPOLE Data**: The OvO attention mechanism reduced the computational cost from 9,633,792 FLOPs to 405,504 FLOPs on a six-modality dataset, a reduction of 95.79%, and performed well on mAUC and BCA metrics. - **eICU Data**: The OvO attention mechanism showed similar advantages on a six-modality dataset, significantly reducing computational costs and achieving good performance in classification tasks. In summary, this paper addresses the efficient integration of multimodal data in the biomedical field by proposing the OvO attention mechanism, providing a new solution for practical applications.

One-Versus-Others Attention: Scalable Multimodal Integration for Biomedical Data

Select & Re-Rank: Effectively and Efficiently Matching Multimodal Data with Dynamically Evolving Attention

Embrace Smaller Attention: Efficient Cross-Modal Matching with Dual Gated Attention Fusion

MDA: An Interpretable and Scalable Multi-Modal Fusion under Missing Modalities and Intrinsic Noise Conditions

Multimodal Fusion Learning with Dual Attention for Medical Imaging

Multimodal Single Cell Data Integration Challenge: Results and Lessons Learned

CMCI: A Robust Multimodal Fusion Method for Spiking Neural Networks

Efficient Bilinear Attention-based Fusion for Medical Visual Question Answering

Multimodal Fusion Method Based on Self-Attention Mechanism

Deep Multimodal Data Fusion

4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

Multi-modal Spatial-modality Attentive Fusion for Studying Neuropsychiatric Disorders

Improved Multimodal Fusion for Small Datasets with Auxiliary Supervision

Multimodal medical image fusion algorithm in the era of big data

That's the Wrong Lung! Evaluating and Improving the Interpretability of Unsupervised Multimodal Encoders for Medical Data

Fusion of medical imaging and electronic health records with attention and multi-head machanisms

Co-Attentive Cross-Modal Deep Learning for Medical Evidence Synthesis and Decision Making

MultiModN- Multimodal, Multi-Task, Interpretable Modular Networks

Multimodal Learning for Multi-Omics: A Survey

Deep Learning Based Multimodal Biomedical Data Fusion: an Overview and Comparative Review

MultiMed: Massively Multimodal and Multitask Medical Understanding