Abstract:Construction of a general-purpose post-recognition error corrector poses a crucial question: how can we most effectively train a model on a large mixture of domain datasets? The answer would lie in learning dataset-specific features and digesting their knowledge in a single model. Previous methods achieve this by having separate correction language models, resulting in a significant increase in parameters. In this work, we present Mixture-of-Experts as a solution, highlighting that MoEs are much more than a scalability tool. We propose a Multi-Task Correction MoE, where we train the experts to become an ``expert'' of speech-to-text, language-to-text and vision-to-text datasets by learning to route each dataset's tokens to its mapped expert. Experiments on the Open ASR Leaderboard show that we explore a new state-of-the-art performance by achieving an average relative $5.0$% WER reduction and substantial improvements in BLEU scores for speech and translation tasks. On zero-shot evaluation, NeKo outperforms GPT-3.5 and Claude-Opus with $15.5$% to $27.6$% relative WER reduction in the Hyporadise benchmark. NeKo performs competitively on grammar and post-OCR correction as a multi-task model.

What problem does this paper attempt to address?

### The problems the paper attempts to solve What this paper attempts to solve is how to effectively train a general post - recognition error - correction model so that it can handle data sets in multiple domains. Specifically, the paper proposes a multi - task correction model - **NEKO**, aiming to improve the post - recognition results of speech, text and visual inputs through the Mixture - of - Experts (MoE) method. ### Main problems and background 1. **Multi - modal post - recognition correction**: - Humans have strong capabilities in multiple modalities such as speech recognition, visual pattern recognition, and semantic and text interpretation, but these capabilities are not perfect and misrecognition errors often occur. - Despite these misrecognitions, humans can still communicate efficiently using speech, language or facial expressions, even when the conversation contains inaccurate vocabulary and ambiguous accents. 2. **Limitations of existing methods**: - Traditional post - recognition correction methods usually rely on separate correction language models, which leads to a significant increase in the number of parameters. - Fine - tuning large language models (LLMs) directly on a variety of different error - correction data sets will lead to sub - optimal performance because different data sets differ in input modalities, output formats, error types and domain characteristics. 3. **Advantages of Mixture - of - Experts (MoE)**: - The Mixture - of - Experts method learns to route inputs to the most appropriate experts through multiple expert networks and a gating network (routing network), thereby achieving more specialized and fine - grained representations. - This method allows the model to share knowledge between different tasks while capturing the specific characteristics of each task. ### Main contributions of the paper 1. **Introduction of NEKO**: - Proposed a large - scale language model (LLM) for multi - task error - correction, using a task - oriented Mixture - of - Experts (MoE) method to handle multiple post - recognition correction tasks. - To the best of the authors' knowledge, this is the first work to explore the use of MoE for multi - task error - correction. 2. **Cross - modal post - recognition correction evaluation**: - In the new cross - modal post - recognition correction evaluation, NEKO performs excellently as an open - source ASR, ST, OCR and TEC baseline model. - Experimental results show that NEKO has reached a new state - of - the - art level as a multi - task error - correction model in the ASR task. 3. **Emerging capabilities of cross - task correction**: - Discovered the emerging capabilities of NEKO in cross - task correction. This is the first such multi - task correction method, providing a new direction for the design of general post - recognition language models. 4. **Open - source plan**: - Plans to open - source the NEKO model, the newly created data set and the training process under the CC BY - SA 4.0 license to support reproducibility and encourage future research. ### Experimental results 1. **ASR task**: - Experiments on the Open ASR Leaderboard show that NEKO has achieved an average 5.0% reduction in WER on multiple data sets. - In particular, on more challenging data sets such as AMI (conversational speech) and VoxPopuli (accented speech), the performance improvement is significant. 2. **ST and MT tasks**: - Experiments on the HypoTranslate data set show that NEKO performs excellently in both zero - shot and few - shot settings, with relative WER reductions ranging from 15.5% to 27.6%. - In the Japanese and Chinese machine translation tasks of WMT'20, NEKO also shows competitiveness. 3. **OCR task**: - Experiments on the Post - OCR Correction data set show that NEKO also has significant improvements in OCR error - correction. 4. **TEC task**: - Experiments on the CoEdIT data set show that NEKO performs excellently in grammar - correction and coherence - improvement tasks, verifying its effectiveness in handling text - editing instructions. ### Conclusion NEKO has achieved significant performance improvements in multi - modal post - recognition correction tasks through the task - oriented Mixture - of - Experts method, demonstrating its strong ability to handle multiple tasks and domain data sets.

NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts

Efficient and Interpretable Grammatical Error Correction with Mixture of Experts

Mix of Experts Language Model for Named Entity Recognition

Efficient Grammatical Error Correction Via Multi-Task Training and Optimized Training Schedule

HMoE: Heterogeneous Mixture of Experts for Language Modeling

A Unified Strategy for Multilingual Grammatical Error Correction with Pre-trained Cross-Lingual Language Model

Neural Correction Model for Open-Domain Named Entity Recognition

MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More

A Closer Look into Mixture-of-Experts in Large Language Models

Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting

Multi-head Sequence Tagging Model for Grammatical Error Correction

MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing

Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models

BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition

Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models

HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models

MEMoE: Enhancing Model Editing with Mixture of Experts Adaptors

Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition

Leveraging Denoised Abstract Meaning Representation for Grammatical Error Correction

Rethinking the Roles of Large Language Models in Chinese Grammatical Error Correction

Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer