Abstract:The recurrent neural networks (RNNs) along with connectionist temporal classification (CTC) have been widely used in many sequence to sequence tasks, including automatic speech recognition (ASR), lipreading, and scene text recognition (STR). In these systems, CTC-trained RNNs usually require specific CTC-decoders after their output layers. Many existing CTC-trained RNN inference systems use FPGA to do calculations of RNNs, and decode their outputs on CPU. However, with the development of FPGA-based RNN hardware accelerators, existing CPU-based CTC-decoder can not meet the latency requirement of them. To resolve this issue, this paper proposes an efficient hardware architecture for the CTC beam search decoder based on the decoding method reported in our previous work. The experimental results show that the system latency per sample of the CTC-decoder is only 7.19us on Xilinx xc7vx1140tflg19301 FPGA platform, which is lower than state-of-the-art RNNs. We also implement the origin algorithm on the same FPGA platform. Comparison results show that the improved one reduces the system latency per sample by 63.67%, the LUTRAMs by 97.44%, the FFs by 79.55%, and the DSPs by 50%. To the best of our knowledge, this is the first work on hardware implementation for CTC beam search decoder.

An Improved Algorithm of CTI and Its Implementation on Hardware

A Parallel Stereo Matching Algorithm Core for FPGA Modeled by DSP Builder

Improved real-time correlation-based FPGA stereo vision system

An Algorithm for Chrominance Transient Improvement and Its Implementation

A new VLSI design for Viterbi decoder based on ASIP

A Low-Latency and Low-Complexity Hardware Architecture for CTC Beam Search Decoding

Implementation of MDCT algorithm based on FPGA in audio coding

Multi-decision Based Impulse De-noising Algorithm and Hardware Implementation

The FPGA implementation of narrowband active noise control system

Design and implementation of NBI suppression algorithm based on FPGA

FPGA implementation of CIS speech processing strategy for Cochlear Implants

Design and Implementation of ACELP Vocoder Based on FPGA

Implementation of High Quality 0.6 Kb/s Vocoder System Based on TMS320VC55x

A Low-Noise CTIA-based Pixel with CDS for SWIR Focal Plane Arrays

Two methods of design and implementation of ACELP vocoder

The Implementation of 2D-DCT Based on FPGA

An Efficient Implementation Algorithm Of Iir Filter Based On Cpld

A 20ps resolution wave union FPGA TDC with on-chip real time correction

An Algorithm and Its VLSI Implementation of Channel Estimation and Equalization for CMMB System

Hardware Implement of LSP Parameter Quantization Algorithm in G.729

Real-Time Implementation of an Efficient Speech Enhancement Algorithm for Digital Hearing Aids