Exploration of English speech translation recognition based on the LSTM RNN algorithm

Qiwei Yuan,Yu Dai,Guangming Li
DOI: https://doi.org/10.1007/s00521-023-08462-8
2023-03-23
Neural Computing and Applications
Abstract:In today’s information society, the demand for intelligence is increasing daily. English speech translation recognition technology based on the LSTM (long short-term memory) recurrent neural network (RNN) algorithm is an important manifestations of computer intelligence. In recent years, many scholars have conducted research on speech translation recognition technology, including template matching and statistical pattern recognition. Each of these methods has its drawbacks. This paper discusses English speech recognition techniques by utilizing the basic RNN principles. Moreover, its application and construction in practice, which can provide some useful reference for future researchers, are analysed. LSTM RNN is an intelligent system that is different from traditional pattern recognition methods. The greatest difference is that it simulates the information processing of the human brain and realizes the intelligent information processing in a distributed manner. It has a variety of automatic recognition and extraction functions, such as storage, association, and retrieval, especially for speech translation and recognition problems with high perception ability. This new neural network recognition system has a strong scientific nature and can store sound information in a decentralized manner, similar to the human brain. The LSTM RNN has been widely used in the speech recognition field due to its excellent performance in extraction and classification. The study found that the recognition accuracy of the original RNN was generally maintained between 48 and 54%, and the data loss rate was relatively high. The accuracy rate of speech recognition based on LSTM RNN was as high as 94%, and the information storage efficiency was high, which greatly avoided repetitive processes. The voice data processing speed can be completed in 4.5 s at the fastest, which plays an important role in terms of mass satisfaction and social development needs.
computer science, artificial intelligence
What problem does this paper attempt to address?