Abstract:Large language models (LLMs) have garnered unprecedented advancements across diverse fields, ranging from natural language processing to computer vision and beyond. The prowess of LLMs is underpinned by their substantial model size, extensive and diverse datasets, and the vast computational power harnessed during training, all of which contribute to the emergent abilities of LLMs (e.g., in-context learning) that are not present in small models. Within this context, the mixture of experts (MoE) has emerged as an effective method for substantially scaling up model capacity with minimal computation overhead, gaining significant attention from academia and industry. Despite its growing prevalence, there lacks a systematic and comprehensive review of the literature on MoE. This survey seeks to bridge that gap, serving as an essential resource for researchers delving into the intricacies of MoE. We first briefly introduce the structure of the MoE layer, followed by proposing a new taxonomy of MoE. Next, we overview the core designs for various MoE models including both algorithmic and systemic aspects, alongside collections of available open-source implementations, hyperparameter configurations and empirical evaluations. Furthermore, we delineate the multifaceted applications of MoE in practice, and outline some potential directions for future research. To facilitate ongoing updates and the sharing of cutting-edge developments in MoE research, we have established a resource repository accessible at <a class="link-external link-https" href="https://github.com/withinmiaov/A-Survey-on-Mixture-of-Experts" rel="external noopener nofollow">this https URL</a>.

A Single Loop EM Algorithm for the Mixture of Experts Architecture

Asymptotic Convergence Properties of the Em Algorithm for Mixture of Experts

Learning Mixtures of Experts with EM

A Modified Mixtures of Experts Architecture for Classification with Diverse Features

An Efficient Em Approach To Parameter Learning Of The Mixture Of Gaussian Processes

An Mcmc Based Em Algorithm For Mixtures Of Gaussian Processes

Extended Mixture of MLP Experts by Hybrid of Conjugate Gradient Method and Modified Cuckoo Search

A Mixed Evolutionary Algorithm to Solve the O-D Matrix Estimation Problem

A Novel Split and Merge EM Algorithm for Gaussian Mixture Model

On Least Square Estimation in Softmax Gating Mixture of Experts

Big Learning Expectation Maximization

Network EM Algorithm for Gaussian Mixture Model in Decentralized Federated Learning

On the Behavior of the Expectation-Maximization Algorithm for Mixture Models

An Effective EM Algorithm for Mixtures of Gaussian Processes Via the MCMC Sampling and Approximation.

Fast Deep Mixtures of Gaussian Process Experts

Multi-view EM algorithm for finite mixture models

A Survey on Mixture of Experts

AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference

Adaptive Gating in Mixture-of-Experts based Language Models

Mixture of robust Gaussian processes and its hard-cut EM algorithm with variational bounding approximation