FuseMoE: Mixture-of-Experts Transformers for Fleximodal Fusion

Xing Han,Huy Nguyen,Carl Harris,Nhat Ho,Suchi Saria
2024-05-23
Abstract:As machine learning models in critical fields increasingly grapple with multimodal data, they face the dual challenges of handling a wide array of modalities, often incomplete due to missing elements, and the temporal irregularity and sparsity of collected samples. Successfully leveraging this complex data, while overcoming the scarcity of high-quality training samples, is key to improving these models' predictive performance. We introduce ``FuseMoE'', a mixture-of-experts framework incorporated with an innovative gating function. Designed to integrate a diverse number of modalities, FuseMoE is effective in managing scenarios with missing modalities and irregularly sampled data trajectories. Theoretically, our unique gating function contributes to enhanced convergence rates, leading to better performance in multiple downstream tasks. The practical utility of FuseMoE in the real world is validated by a diverse set of challenging prediction tasks.
Machine Learning
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve several key challenges in multimodal data fusion, especially when dealing with FlexiModal data. Specifically, the authors propose a new Mixture - of - Experts framework named FuseMoE to address the following issues: 1. **Fusion of multiple modalities**: - Existing multimodal fusion methods can usually handle only a small number of modalities (such as two or three), and when the number of modalities increases, a large amount of cross - modal computation and model architecture modification are required. This limits their scalability in practical applications. - FuseMoE can flexibly handle any number of modalities, including missing modalities and irregularly sampled data. 2. **Handling of missing modalities**: - Many existing multimodal fusion methods are unable to handle missing modalities or use simple imputation methods, which may lead to sub - optimal performance. - FuseMoE effectively deals with the problem of missing modalities by dynamically adjusting the influence of the experts responsible for the missing modalities. 3. **Temporal irregularity and sparsity**: - Multimodal data often has complex temporal dynamic characteristics, including irregular sampling and sparsity. Existing methods usually ignore these problems or rely on inappropriate position embedding schemes. - FuseMoE introduces a multi - time attention mechanism to handle temporal irregularity and sparsity. 4. **Improvement of prediction performance**: - In key areas (such as medical prediction, sentiment analysis, etc.), machine learning models need to handle multimodal data and improve prediction performance in the case of scarce high - quality training samples. - FuseMoE improves the prediction performance of multiple downstream tasks by introducing a novel Laplace gating function and theoretically proving a better convergence rate. 5. **Theoretical guarantees and empirical verification**: - The authors not only theoretically prove that the Laplace gating function has better convergence behavior compared to the traditional Softmax gating function, but also conduct extensive empirical evaluations on multiple benchmark datasets to verify the effectiveness of FuseMoE. ### Summary In general, by proposing the FuseMoE framework, this paper aims to solve problems such as flexibility, missing - modality handling, temporal irregularity, and prediction performance improvement in multimodal data fusion, and is especially suitable for FlexiModal data scenarios.