Cross-Modal Prototype based Multimodal Federated Learning under Severely Missing Modality

Huy Q. Le,Chu Myaet Thwal,Yu Qiao,Ye Lin Tun,Minh N. H. Nguyen,Choong Seon Hong
2024-01-25
Abstract:Multimodal federated learning (MFL) has emerged as a decentralized machine learning paradigm, allowing multiple clients with different modalities to collaborate on training a machine learning model across diverse data sources without sharing their private data. However, challenges, such as data heterogeneity and severely missing modalities, pose crucial hindrances to the robustness of MFL, significantly impacting the performance of global model. The absence of a modality introduces misalignment during the local training phase, stemming from zero-filling in the case of clients with missing modalities. Consequently, achieving robust generalization in global model becomes imperative, especially when dealing with clients that have incomplete data. In this paper, we propose Multimodal Federated Cross Prototype Learning (MFCPL), a novel approach for MFL under severely missing modalities by conducting the complete prototypes to provide diverse modality knowledge in modality-shared level with the cross-modal regularization and modality-specific level with cross-modal contrastive mechanism. Additionally, our approach introduces the cross-modal alignment to provide regularization for modality-specific features, thereby enhancing overall performance, particularly in scenarios involving severely missing modalities. Through extensive experiments on three multimodal datasets, we demonstrate the effectiveness of MFCPL in mitigating these challenges and improving the overall performance.
Machine Learning
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve two main challenges in multimodal federated learning (MFL): 1. **Data Heterogeneity**: In MFL, the data distributions of different clients may vary greatly, which leads to difficulties in model training. This heterogeneity will cause the performance of the global model to decline. 2. **Severely Missing Modalities**: In practical applications, some clients may not be able to provide data for all modalities (for example, due to sensor failures, different hardware platforms or operational problems). This situation of missing modalities will further affect the robustness and performance of the model. To solve these problems, the authors propose a multimodal federated cross - prototype learning framework (MFCPL). MFCPL provides diverse modality knowledge by introducing complete prototypes and uses three components, namely cross - modal prototypes regularization (CMPR), cross - modal prototypes contrastive (CMPC) and cross - modal alignment (CMA), to enhance the robustness and generalization ability of the model. Specifically, the main contributions of MFCPL include: - **Introducing complete prototypes in heterogeneous MFL for the first time** to address the challenge of missing modalities. - **Proposing a novel multimodal federated learning framework, MFCPL**, which learns the global model through two levels of complete prototypes (modality - specific representations and modality - shared representations). - **Verifying the effectiveness of MFCPL through extensive experiments** and demonstrating its superiority on three multimodal federated datasets. These methods work together to enable MFCPL to maintain high model performance and robustness even when dealing with severely missing modalities.