Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition

Yu Yu,Chao-Han Huck Yang,Jari Kolehmainen,Prashanth G. Shivakumar,Yile Gu,Sungho Ryu,Roger Ren,Qi Luo,Aditya Gourav,I-Fan Chen,Yi-Chieh Liu,Tuan Dinh,Ankur Gandhe,Denis Filimonov,Shalini Ghosh,Andreas Stolcke,Ariya Rastow,Ivan Bulyko
DOI: https://doi.org/10.1109/ASRU57964.2023.10389632
2023-10-10
Abstract:We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring. Although pretrained language models (LMs) like BERT have shown superior performance in second-pass rescoring, the high computational cost of scaling up the pretraining stage and adapting the pretrained models to specific domains limit their practical use in rescoring. Here we present a method based on low-rank decomposition to train a rescoring BERT model and adapt it to new domains using only a fraction (0.08%) of the pretrained parameters. These inserted matrices are optimized through a discriminative training objective along with a correlation-based regularization loss. The proposed low-rank adaptation Rescore-BERT (LoRB) architecture is evaluated on LibriSpeech and internal datasets with decreased training times by factors between 5.4 and 3.6.
Computation and Language,Artificial Intelligence,Machine Learning,Neural and Evolutionary Computing,Sound,Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to efficiently utilize large pre - trained language models (such as BERT) during the second - pass rescoring process of Automatic Speech Recognition (ASR) systems while reducing computational costs and the number of parameters. Specifically, the authors propose a method based on Low - rank Adaptation (LoRA). By inserting low - rank matrices in each Transformer layer to fine - tune the BERT model to adapt to new domains. This method only needs to update a very small part (0.08%) of the pre - trained model parameters, thereby significantly reducing training time and memory usage while maintaining good rescoring performance. The main contributions of the paper include: 1. **Low - rank Adaptation Method**: By inserting low - rank matrices in each Transformer layer instead of fine - tuning the entire model, the number of parameters and computational costs are reduced. 2. **Balance between Performance and Efficiency**: Experimental results show that the LoRA method achieves performance comparable to or even better than Full Fine - Tuning (FT) on multiple datasets, but the training time is reduced by 6 times and the memory usage is reduced by 32%. 3. **Generalization Ability**: The LoRA method not only performs well within the target domain but also outperforms other parameter - efficient fine - tuning methods in non - target domains. 4. **Multi - loss Training**: To further improve the generalization ability of the model, the authors introduce a correlation - based regularization loss, which is used in combination with the Minimum Word Error Rate (MWER) loss, effectively alleviating the over - fitting problem. Through these innovations, the paper provides a method for efficiently using large pre - trained language models for speech recognition rescoring in resource - constrained situations.