Abstract:This article uses Field Programmable Gate Array (FPGA) as a carrier and uses IP core to form a System on Programmable Chip (SOPC) English speech recognition system. The SOPC system uses a modular hardware system design method. Except for the independent development of the hardware acceleration module and its control module, the other modules are implemented by software or IP provided by Xilinx development tools. Hardware acceleration IP adopts a top-down design method, provides parallel operation of multiple operation components, and uses pipeline technology, which speeds up data operation, so that only one operation cycle is required to obtain an operation result. In terms of recognition algorithm, a more effective training algorithm is proposed, Genetic Continuous Hidden Markov Model (GA_CHMM), which uses genetic algorithm to directly train CHMM model. It is to find the optimal model by encoding the parameter values of the CHMM and performing operations such as selection, crossover, and mutation according to the fitness function. The optimal parameter value after decoding corresponds to the CHMM model, and then the English speech recognition is performed through the CHMM algorithm. This algorithm can save a lot of training time, thereby improving the recognition rate and speed. This paper studies the optimization of embedded system software. By studying the fixed-point software algorithm and the optimization of system storage space, the real-time response speed of the system has been reduced from about 10 seconds to an average of 220 milliseconds. Through the optimization of the CHMM algorithm, the real-time performance of the system is improved again, and the average time to complete the recognition is significantly shortened. At the same time, the system can achieve a recognition rate of over 90% when the English speech vocabulary is less than 200.

Ese: Efficient Speech Recognition Engine with Sparse Lstm on Fpga

A Spiking LSTM Accelerator for Automatic Speech Recognition Application Based on FPGA

C-LSTM: Enabling Efficient LSTM Using Structured Compression Techniques on FPGAs

FPGA Implementation of LSTM Based on Automatic Speech Recognition

Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity

A Highly Configurable 7.62gop/s Hardware Implementation for LSTM

Acceleration of LSTM with Structured Pruning Method on FPGA

A low-latency LSTM accelerator using balanced sparsity based on FPGA

Design and Implementation of Intelligent Speech Recognition System Based on FPGA

E-LSTM: Efficient Inference of Sparse LSTM on Embedded Heterogeneous System

A Power-Efficient Accelerator Based on FPGAs for LSTM Network

FPGA-based Accelerator for Long Short-Term Memory Recurrent Neural Networks

A compression strategy to accelerate LSTM meta-learning on FPGA

Hardware-Guided Symbiotic Training for Compact, Accurate, yet Execution-Efficient LSTM

E-RNN: Design Optimization for Efficient Recurrent Neural Networks in FPGAs

A Compact and Configurable Long Short-Term Memory Neural Network Hardware Architecture.

A 3.89-Gops/mw Scalable Recurrent Neural Network Processor with Improved Efficiency on Memory and Computation

Implementation and Optimization of the Accelerator Based on FPGA Hardware for LSTM Network

Design and Implementation of Embedded Real-Time English Speech Recognition System Based on Big Data Analysis

LlamaF: An Efficient Llama2 Architecture Accelerator on Embedded FPGAs

Efficient FPGA Implementation of Convolutional Neural Networks and Long Short-Term Memory for Radar Emitter Signal Recognition