Abstract:Multi-modal pre-trained models efficiently extract and fuse features from different modalities with low memory requirements for fine-tuning. Despite this efficiency, their application in disease diagnosis is under-explored. A significant challenge is the frequent occurrence of missing modalities, which impairs performance. Additionally, fine-tuning the entire pre-trained model demands substantial computational resources. To address these issues, we introduce Modality-aware Low-Rank Adaptation (MoRA), a computationally efficient method. MoRA projects each input to a low intrinsic dimension but uses different modality-aware up-projections for modality-specific adaptation in cases of missing modalities. Practically, MoRA integrates into the first block of the model, significantly improving performance when a modality is missing. It requires minimal computational resources, with less than 1.6% of the trainable parameters needed compared to training the entire model. Experimental results show that MoRA outperforms existing techniques in disease diagnosis, demonstrating superior performance, robustness, and training efficiency.

What problem does this paper attempt to address?

This paper attempts to solve two main problems encountered when applying multimodal pre - training models in disease diagnosis: 1. **Missing Modality**: In the actual clinical environment, data of certain modalities may be missing (for example, the patient's chest X - ray image is complete, but some corresponding annotations are missing). In this case, the performance of the multimodal pre - training model will be significantly degraded. Specifically, when the modality data is incomplete, the existing methods perform poorly, especially when the missing modality settings in the training and testing sets are different. 2. **High demand for computing resources**: Most multimodal pre - training models are based on the large - scale Transformer architecture, and fine - tuning the entire pre - training model requires a large amount of computing resources. This makes the computing cost too high in practical applications, especially when fine - tuning on small data sets. To solve these problems, the authors propose the **Modality - aware Low - Rank Adaptation (MoRA)** method. The core idea of MoRA is to project each input into a low - dimensional space and use a modality - specific up - projection matrix to adapt to the missing situations of different modalities. Specifically: - MoRA only needs to be integrated into the first module of the model, thereby greatly reducing the number of parameters that need to be trained (accounting for less than 1.6% of the total number of parameters), and significantly reducing the demand for computing resources. - MoRA shows stronger robustness and higher performance when dealing with missing modalities, especially when the text modality is severely missing. The experimental results show that MoRA outperforms the existing methods under various missing modality ratios, which not only improves the accuracy and robustness of diagnosis but also enhances the training efficiency. ### Key contributions - Proposed the MoRA method to improve the performance and robustness of multimodal pre - training models in the case of modality missing in the training and testing sets. - The experimental results prove that MoRA achieves the state - of - the - art performance and robustness when dealing with missing modalities. - Verified the superior performance and robustness of MoRA through experiments with different missing rates. ### Summary MoRA is an efficient and robust method that can improve the accuracy of disease diagnosis in the case of multimodal data missing while significantly reducing the demand for computing resources.

MoRA: LoRA Guided Multi-Modal Disease Diagnosis with Missing Modality