Abstract:We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring. Although pretrained language models (LMs) like BERT have shown superior performance in second-pass rescoring, the high computational cost of scaling up the pretraining stage and adapting the pretrained models to specific domains limit their practical use in rescoring. Here we present a method based on low-rank decomposition to train a rescoring BERT model and adapt it to new domains using only a fraction (0.08%) of the pretrained parameters. These inserted matrices are optimized through a discriminative training objective along with a correlation-based regularization loss. The proposed low-rank adaptation Rescore-BERT (LoRB) architecture is evaluated on LibriSpeech and internal datasets with decreased training times by factors between 5.4 and 3.6.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to efficiently utilize large pre - trained language models (such as BERT) during the second - pass rescoring process of Automatic Speech Recognition (ASR) systems while reducing computational costs and the number of parameters. Specifically, the authors propose a method based on Low - rank Adaptation (LoRA). By inserting low - rank matrices in each Transformer layer to fine - tune the BERT model to adapt to new domains. This method only needs to update a very small part (0.08%) of the pre - trained model parameters, thereby significantly reducing training time and memory usage while maintaining good rescoring performance. The main contributions of the paper include: 1. **Low - rank Adaptation Method**: By inserting low - rank matrices in each Transformer layer instead of fine - tuning the entire model, the number of parameters and computational costs are reduced. 2. **Balance between Performance and Efficiency**: Experimental results show that the LoRA method achieves performance comparable to or even better than Full Fine - Tuning (FT) on multiple datasets, but the training time is reduced by 6 times and the memory usage is reduced by 32%. 3. **Generalization Ability**: The LoRA method not only performs well within the target domain but also outperforms other parameter - efficient fine - tuning methods in non - target domains. 4. **Multi - loss Training**: To further improve the generalization ability of the model, the authors introduce a correlation - based regularization loss, which is used in combination with the Minimum Word Error Rate (MWER) loss, effectively alleviating the over - fitting problem. Through these innovations, the paper provides a method for efficiently using large pre - trained language models for speech recognition rescoring in resource - constrained situations.

Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition

Investigating Training Strategies and Model Robustness of Low-Rank Adaptation for Language Modeling in Speech Recognition

Sparse Low-rank Adaptation of Pre-trained Language Models

LoRA$^2$ : Multi-Scale Low-Rank Approximations for Fine-Tuning Large Language Models

Discriminative Speech Recognition Rescoring with Pre-trained Language Models

ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation

OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models

Distillation Strategies for Discriminative Speech Recognition Rescoring

SwitchLoRA: Switched Low-Rank Adaptation Can Learn Full-Rank Information

Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal Quantization levels and Rank Values trough Differentiable Bayesian Gates

Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning

The Expressive Power of Low-Rank Adaptation

Enhancing Parameter Efficiency and Generalization in Large-Scale Models: A Regularized and Masked Low-Rank Adaptation Approach

Low-Rank Adaptation for Multilingual Summarization: An Empirical Study

ResLoRA: Identity Residual Mapping in Low-Rank Adaption

Full-Rank No More: Low-Rank Weight Training for Modern Speech Recognition Models

LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

LoRTA: Low Rank Tensor Adaptation of Large Language Models

PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation

LoRA-Mini : Adaptation Matrices Decomposition and Selective Training