A GPU-based Parallel WFST Decoder on Nnet3

Yong Wang,Jie Liu,Chen Zhou,Zhengbin Pang,Shengguo Li,Chunye Gong,Xinbiao Gan,Yurong Li
DOI: https://doi.org/10.1063/1.5090751
2019-01-01
AIP Conference Proceedings
Abstract:One performance-intensive part of automatic speech recognition is the weighted finite-state transducer (WFST) decoding. To solve the problem, we expand parallel Graphics Processing Units (GPU) computing to the decoding period. We describe extension work based on Kaldi toolkit for speech recognition research. Our work can support weighted finitestate transducer decoding on Kaldi neural nets with CUDA toolkit. Our paper also expands an efficient parallel Viterbi beam decoding algorithm to decrease the speech recognition Real Time Factor (RTF) value. Together with our optimization algorithm, we have reached 2.3x speed up on the AISHELL corpus decoding. We also implement nnet3 decoder that improves real-time speed up with no word error rate raise.
What problem does this paper attempt to address?