Abstract:This paper presents our developed decoder which adopts the idea of statically optimizing part of the knowledge sources while handling the others dynamically. The lexicon, phonetic contexts and acoustic model are statically integrated to form a memory-efficient state network, while the language model (LM) is dynamically incorporated on the fly by means of extended tokens. The novelties of our approach for constructing the state network are (1) introducing two layers of dummy nodes to cluster the cross-word (CW) context dependent fan-in and fan-out triphones, (2) introducing a so-called “WI layer” to store the word identities and putting the nodes of this layer in the non-shared mid-part of the network, (3) optimizing the network at state level by a sufficient forward and backward node-merge process. The state network is organized as a multi-layer structure for distinct token propagation at each layer. By exploiting the characteristics of the state network, several techniques including LM look-ahead, LM cache and beam pruning are specially designed for search efficiency. Especially in beam pruning, a layer-dependent pruning method is proposed to further reduce the search space. The layer-dependent pruning takes account of the neck-like characteristics of WI layer and the reduced variety of word endings, which enables tighter beam without introducing much search errors. In addition, other techniques including LM compression, lattice-based bookkeeping and lattice garbage collection are also employed to reduce the memory requirements. Experiments are carried out on a Mandarin spontaneous speech recognition task where the decoder involves a trigram LM and CW triphone models. A comparison with HDecode of HTK toolkits shows that, within 1% performance deviation, our decoder can run 5 times faster with half of the memory footprint.

A GPU-based Parallel WFST Decoder on Nnet3

GPU-Accelerated WFST Beam Search Decoder for CTC-based Speech Recognition

GPU-Accelerated Viterbi Exact Lattice Decoder for Batched Online and Offline Speech Recognition

A Study of Large Vocabulary Speech Recognition Decoding Using Finite-State Graphs

Efficient Decoding Self-Attention for End-to-end Speech Synthesis

Parallel Decoding for Non-recursive Convolutional Codes and Its Enhancement Through Artificial Neural Networks

Woodpecker-DL: Accelerating Deep Neural Networks via Hardware-Aware Multifaceted Optimizations

High-Throughput and Memory-Efficient Parallel Viterbi Decoder for Convolutional Codes on GPU

Performance Evaluation of Channel Decoding with Deep Neural Networks.

WNARS: WFST based Non-autoregressive Streaming End-to-End Speech Recognition

A One-Pass Real-Time Decoder Using Memory-Efficient State Network

GPU Based Real-Time UHD Intra Decoding for AVS3

Implementation of Accelerated BCH Decoders on GPU.

Speech Super-Resolution Using Parallel WaveNet

Efficient One-Pass Decoding with Nnlm for Speech Recognition

BER Guaranteed Optimization and Implementation of Parallel Turbo Decoding on GPU

A Scalable Graph Neural Network Decoder for Short Block Codes

CUDA Acceleration for AVS2 Loop Filtering

Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask

R-BI: Regularized Batched Inputs enhance Incremental Decoding Framework for Low-Latency Simultaneous Speech Translation

Accelerating the Training of HTK on GPU with CUDA