Abstract:As machine learning models in critical fields increasingly grapple with multimodal data, they face the dual challenges of handling a wide array of modalities, often incomplete due to missing elements, and the temporal irregularity and sparsity of collected samples. Successfully leveraging this complex data, while overcoming the scarcity of high-quality training samples, is key to improving these models' predictive performance. We introduce ``FuseMoE'', a mixture-of-experts framework incorporated with an innovative gating function. Designed to integrate a diverse number of modalities, FuseMoE is effective in managing scenarios with missing modalities and irregularly sampled data trajectories. Theoretically, our unique gating function contributes to enhanced convergence rates, leading to better performance in multiple downstream tasks. The practical utility of FuseMoE in the real world is validated by a diverse set of challenging prediction tasks.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve several key challenges in multimodal data fusion, especially when dealing with FlexiModal data. Specifically, the authors propose a new Mixture - of - Experts framework named FuseMoE to address the following issues: 1. **Fusion of multiple modalities**: - Existing multimodal fusion methods can usually handle only a small number of modalities (such as two or three), and when the number of modalities increases, a large amount of cross - modal computation and model architecture modification are required. This limits their scalability in practical applications. - FuseMoE can flexibly handle any number of modalities, including missing modalities and irregularly sampled data. 2. **Handling of missing modalities**: - Many existing multimodal fusion methods are unable to handle missing modalities or use simple imputation methods, which may lead to sub - optimal performance. - FuseMoE effectively deals with the problem of missing modalities by dynamically adjusting the influence of the experts responsible for the missing modalities. 3. **Temporal irregularity and sparsity**: - Multimodal data often has complex temporal dynamic characteristics, including irregular sampling and sparsity. Existing methods usually ignore these problems or rely on inappropriate position embedding schemes. - FuseMoE introduces a multi - time attention mechanism to handle temporal irregularity and sparsity. 4. **Improvement of prediction performance**: - In key areas (such as medical prediction, sentiment analysis, etc.), machine learning models need to handle multimodal data and improve prediction performance in the case of scarce high - quality training samples. - FuseMoE improves the prediction performance of multiple downstream tasks by introducing a novel Laplace gating function and theoretically proving a better convergence rate. 5. **Theoretical guarantees and empirical verification**: - The authors not only theoretically prove that the Laplace gating function has better convergence behavior compared to the traditional Softmax gating function, but also conduct extensive empirical evaluations on multiple benchmark datasets to verify the effectiveness of FuseMoE. ### Summary In general, by proposing the FuseMoE framework, this paper aims to solve problems such as flexibility, missing - modality handling, temporal irregularity, and prediction performance improvement in multimodal data fusion, and is especially suitable for FlexiModal data scenarios.

FuseMoE: Mixture-of-Experts Transformers for Fleximodal Fusion

Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts

Multi-modal Gated Mixture of Local-to-Global Experts for Dynamic Image Fusion

DrFuse: Learning Disentangled Representation for Clinical Multi-Modal Fusion with Missing Modality and Modal Inconsistency

Fuse It or Lose It: Deep Fusion for Multimodal Simulation-Based Inference

Fusing Models with Complementary Expertise

MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

Tutel: Adaptive Mixture-of-Experts at Scale

MoDE: A Mixture-of-Experts Model with Mutual Distillation among the Experts

AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference

MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language Models

HyperMoE: Towards Better Mixture of Experts via Transferring Among Experts

Task-Customized Mixture of Adapters for General Image Fusion

FasterMoE

Cool-Fusion: Fuse Large Language Models without Training

Multi-Head Mixture-of-Experts

HMoE: Heterogeneous Mixture of Experts for Language Modeling

FedMoE: Personalized Federated Learning via Heterogeneous Mixture of Experts

Sparse Fusion Mixture-of-Experts are Domain Generalizable Learners

MH-MoE: Multi-Head Mixture-of-Experts

Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts