Abstract:We previously proposed contextual spelling correction (CSC) to correct the output of end-to-end (E2E) automatic speech recognition (ASR) models with contextual information such as name, place, etc. Although CSC has achieved reasonable improvement in the biasing problem, there are still two drawbacks for further accuracy improvement. First, due to information limitation in text only hypothesis or weak performance of ASR model on rare domains, the CSC model may fail to correct phrases with similar pronunciation or anti-context cases where all biasing phrases are not present in the utterance. Second, there is a discrepancy between the training and inference of CSC. The bias list in training is randomly selected but in inference there may be more similarity between ground truth phrase and other phrases. To solve above limitations, in this paper we propose an improved non-autoregressive (NAR) spelling correction model for contextual biasing in E2E neural transducer-based ASR systems to improve the previous CSC model from two perspectives: Firstly, we incorporate acoustics information with an external attention as well as text hypotheses into CSC to better distinguish target phrase from dissimilar or irrelevant phrases. Secondly, we design a semantic aware data augmentation schema in training phrase to reduce the mismatch between training and inference to further boost the biasing accuracy. Experiments show that the improved method outperforms the baseline ASR+Biasing system by as much as 20.3% relative name recall gain and achieves stable improvement compared to the previous CSC method over different bias list name coverage ratio.

Contextual Spelling Correction with Large Language Models

Contextual Spelling Correction with Language Model for Low-resource Setting

Improving Contextual Spelling Correction by External Acoustics Attention and Semantic Aware Data Augmentation

A Light-weight contextual spelling correction model for customizing transducer-based speech recognition systems

A Simple yet Effective Training-free Prompt-free Approach to Chinese Spelling Correction Based on Large Language Models

Contextualization of ASR with LLM using phonetic retrieval-based augmentation

C-LLM: Learn to Check Chinese Spelling Errors Character by Character

Rich Semantic Knowledge Enhanced Large Language Models for Few-shot Chinese Spell Checking

End-to-End Speech Recognition Contextualization with Large Language Models

Automatic Chinese Spelling Checking and Correction Based on Character-Based Pre-trained Contextual Representations.

Enhancing Large Language Model-based Speech Recognition by Contextualization for Rare and Ambiguous Words

MLSL-Spell: Chinese Spelling Check Based on Multi-Label Annotation

Contextual Biasing of Named-Entities with Large Language Models

Contextual Multilingual Spellchecker for User Queries

Contextual Similarity is More Valuable than Character Similarity: An Empirical Study for Chinese Spell Checking

Generative error correction for code-switching speech recognition using large language models

ASR Error Correction using Large Language Models

CRASpell: A Contextual Typo Robust Approach to Improve Chinese Spelling Correction

Context-Aware Abbreviation Expansion Using Large Language Models

Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting