Integrated Method of Deep Learning and Large Language Model in Speech Recognition

Xingqi Wang,Zhuoyue Wang,Mingxiu Sui,Bo Guan,Jin Cao,Zixiang Wang
DOI: https://doi.org/10.1109/ICEICT61637.2024.10671048
2024-07-31
Abstract:This research explores the integration of deep learning and large language models in speech recognition to improve accuracy and handle complex contexts. Deep neural network (DNN), convolutional neural network (CNN), long short-term memory network (LSTM), and Transformer-based large language model are used to build an integrated acoustic and language model framework. Experiments on TIMIT, LibriSpeech, and Common Voice datasets show that the ensemble model significantly improves both word error rate (WER) and real-time factor (RTF) compared to traditional models. The model demonstrates superior performance in adaptability to multiple languages and accent changes. The results suggest that technology integration can effectively enhance the performance of speech recognition systems in complex environments, providing new directions for future development.
Linguistics,Computer Science
What problem does this paper attempt to address?