One-Versus-Others Attention: Scalable Multimodal Integration for Biomedical Data

Michal Golovanevsky,Eva Schiller,Akira Nair,Eric Han,Ritambhara Singh,Carsten Eickhoff
2024-10-22
Abstract:Multimodal learning models have become increasingly important as they surpass single-modality approaches on diverse tasks ranging from question-answering to autonomous driving. Despite the importance of multimodal learning, existing efforts focus on NLP applications, where the number of modalities is typically less than four (audio, video, text, images). However, data inputs in other domains, such as the medical field, may include X-rays, PET scans, MRIs, genetic screening, clinical notes, and more, creating a need for both efficient and accurate information fusion. Many state-of-the-art models rely on pairwise cross-modal attention, which does not scale well for applications with more than three modalities. For $n$ modalities, computing attention will result in $n \choose 2$ operations, potentially requiring considerable amounts of computational resources. To address this, we propose a new domain-neutral attention mechanism, One-Versus-Others (OvO) attention, that scales linearly with the number of modalities and requires only $n$ attention operations, thus offering a significant reduction in computational complexity compared to existing cross-modal attention algorithms. Using three diverse real-world datasets as well as an additional simulation experiment, we show that our method improves performance compared to popular fusion techniques while decreasing computation costs.
Machine Learning
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to address the efficient integration of multimodal data in the biomedical field. Specifically, existing multimodal models face issues of high computational complexity and poor scalability when handling data from more than 2 modalities. These problems are particularly pronounced in the healthcare sector, where medical data typically includes various modalities such as X-rays, PET scans, MRI, genetic screenings, genomic data, and clinical notes. To tackle this challenge, the authors propose a new attention mechanism called the One-Versus-Others (OvO) attention mechanism. The computational complexity of this mechanism grows linearly with the number of modalities, rather than quadratically as with existing cross-attention or self-attention mechanisms. Through this approach, the OvO attention mechanism significantly reduces computational costs while maintaining or improving performance. ### Main Contributions of the Paper 1. **Proposing the OvO Attention Mechanism**: This is a new method for multimodal integration with linear computational complexity, suitable for handling data with a large number of modalities. 2. **Validating the Method's Effectiveness**: The OvO attention mechanism's effectiveness is demonstrated through three different clinical datasets (MIMIC-IV and CXR data, TADPOLE data, eICU data), showing that it can reduce computational costs while maintaining or improving performance. 3. **Performance Comparison**: Compared to existing early fusion and cross-attention methods, the OvO attention mechanism significantly reduces computational load (by at least 91.98%) and performs better on certain tasks. ### Experimental Results - **MIMIC-IV and CXR Data**: The OvO attention mechanism reduced the computational cost from 67,633,152 FLOPs to 4,227,072 FLOPs on a four-modality dataset, a reduction of 93.75%, while performing well on AUROC and AUPRC metrics. - **TADPOLE Data**: The OvO attention mechanism reduced the computational cost from 9,633,792 FLOPs to 405,504 FLOPs on a six-modality dataset, a reduction of 95.79%, and performed well on mAUC and BCA metrics. - **eICU Data**: The OvO attention mechanism showed similar advantages on a six-modality dataset, significantly reducing computational costs and achieving good performance in classification tasks. In summary, this paper addresses the efficient integration of multimodal data in the biomedical field by proposing the OvO attention mechanism, providing a new solution for practical applications.