Abstract:Large Language Models (LLMs) have demonstrated substantial potential for error correction in Automatic Speech Recognition (ASR). However, most research focuses on utterances from short-duration speech recordings, which are the predominant form of speech data for supervised ASR training. This paper investigates the effectiveness of LLMs for error correction in full-text generated by ASR systems from longer speech recordings, such as transcripts from podcasts, news broadcasts, and meetings. First, we develop a Chinese dataset for full-text error correction, named ChFT, utilizing a pipeline that involves text-to-speech synthesis, ASR, and error-correction pair extractor. This dataset enables us to correct errors across contexts, including both full-text and segment, and to address a broader range of error types, such as punctuation restoration and inverse text normalization, thus making the correction process comprehensive. Second, we fine-tune a pre-trained LLM on the constructed dataset using a diverse set of prompts and target formats, and evaluate its performance on full-text error correction. Specifically, we design prompts based on full-text and segment, considering various output formats, such as directly corrected text and JSON-based error-correction pairs. Through various test settings, including homogeneous, up-to-date, and hard test sets, we find that the fine-tuned LLMs perform well in the full-text setting with different prompts, each presenting its own strengths and weaknesses. This establishes a promising baseline for further research. The dataset is available on the website.

What problem does this paper attempt to address?

The paper attempts to address the problem of error correction in long texts generated by automatic speech recognition (ASR) systems. Specifically, most existing research mainly focuses on single-sentence error correction for short recordings, which are typically used for supervised ASR training. However, this approach has limitations when dealing with long texts (such as podcasts, news broadcasts, and meeting transcripts), as it fails to comprehensively capture the contextual information of the entire conversation or document and is computationally expensive. To address this issue, the paper proposes the following points: 1. **Constructing a Chinese Full-Text Error Correction Dataset (ChFT)**: By using a pipeline that includes text synthesis, ASR, and error correction pair extraction, a dataset specifically for full-text error correction is constructed. This dataset covers not only full-text and paragraph-level error correction but also includes various error types such as punctuation restoration and inverse text normalization. 2. **Using Large Language Models (LLM) for Error Correction**: By fine-tuning pre-trained LLMs, the performance of these models in full-text error correction is evaluated using various prompts and target formats. Different prompts based on full-text and paragraphs are designed, considering multiple output formats such as direct text correction and JSON format error correction pairs. 3. **Experimental Evaluation**: The fine-tuned LLMs are evaluated under different prompts through various test settings, including in-domain test sets, latest test sets, and challenging test sets. The results show that the fine-tuned LLMs perform well in the full-text error correction task, with each prompt having its own advantages and disadvantages. Overall, the paper aims to explore and evaluate the potential of LLMs in long-text error correction, providing a strong benchmark for subsequent research.

Full-text Error Correction for Chinese Speech Recognition with Large Language Model

ASR-EC Benchmark: Evaluating Large Language Models on Chinese ASR Error Correction

Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models

On the (In)Effectiveness of Large Language Models for Chinese Text Correction

Correction Focused Language Model Training for Speech Recognition

ASR Error Correction using Large Language Models

Multi-stage Large Language Model Correction for Speech Recognition

A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition

Evaluating the Capability of Large-scale Language Models on Chinese Grammatical Error Correction Task

Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition

Rethinking the Roles of Large Language Models in Chinese Grammatical Error Correction

Using Large Language Model for End-to-End Chinese ASR and NER

UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction

Exploring the Integration of Large Language Models into Automatic Speech Recognition Systems: An Empirical Study

Evaluating LLMs' grammatical error correction performance in learner Chinese

Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition

HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models

Automatically Correcting Large Language Models: Surveying the Landscape of Diverse Automated Correction Strategies

Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting

Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition

Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets