Multimodal Federated Learning with Missing Modality via Prototype Mask and Contrast

Guangyin Bao,Qi Zhang,Duoqian Miao,Zixuan Gong,Liang Hu,Ke Liu,Yang Liu,Chongyang Shi
2024-02-04
Abstract:In real-world scenarios, multimodal federated learning often faces the practical challenge of intricate modality missing, which poses constraints on building federated frameworks and significantly degrades model inference accuracy. Existing solutions for addressing missing modalities generally involve developing modality-specific encoders on clients and training modality fusion modules on servers. However, these methods are primarily constrained to specific scenarios with either unimodal clients or complete multimodal clients, struggling to generalize effectively in the intricate modality missing scenarios. In this paper, we introduce a prototype library into the FedAvg-based Federated Learning framework, thereby empowering the framework with the capability to alleviate the global model performance degradation resulting from modality missing during both training and testing. The proposed method utilizes prototypes as masks representing missing modalities to formulate a task-calibrated training loss and a model-agnostic uni-modality inference strategy. In addition, a proximal term based on prototypes is constructed to enhance local training. Experimental results demonstrate the state-of-the-art performance of our approach. Compared to the baselines, our method improved inference accuracy by 3.7\% with 50\% modality missing during training and by 23.8\% during uni-modality inference. Code is available at <a class="link-external link-https" href="https://github.com/BaoGuangYin/PmcmFL" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Artificial Intelligence,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The paper aims to address the prevalent issue of modality missing in multimodal federated learning (mFL). Specifically, existing mFL methods are mainly limited to specific scenarios of single-modal clients or complete multimodal clients, and there is severe task drift between clients and servers, making it difficult to generalize effectively in complex modality missing situations. Therefore, this paper proposes a new Prototype Mask and Contrast (PmcmFL) framework, aiming to solve the problem through the following points: 1. **Handling Complex Modality Missing**: Introduces a prototype library to compensate for cross-modal fusion and correct task drift, thereby alleviating performance degradation due to modality missing during training and inference. 2. **Avoiding Task Drift**: Utilizes prototypes as global prior knowledge to compensate for cross-modal fusion when modalities are missing and calibrate task drift. 3. **Improving Inference Accuracy**: During inference, uses prototypes as masks for missing modalities and finds the closest semantic prototype through different matching algorithms. Through these innovative methods, PmcmFL can effectively handle complex modality missing situations during training and inference, significantly improving model performance. Experimental results show that PmcmFL outperforms existing baseline methods under different modality missing rates.