CAT: A CTC-CRF Based ASR Toolkit Bridging the Hybrid and the End-to-end Approaches Towards Data Efficiency and Low Latency

Keyu An,Hongyu Xiang,Zhijian Ou
DOI: https://doi.org/10.21437/interspeech.2020-2732
2020-01-01
Abstract:In this paper, we present a new open source toolkit for speech recognition,named CAT (CTC-CRF based ASR Toolkit). CAT inherits the data-efficiency of thehybrid approach and the simplicity of the E2E approach, providing afull-fledged implementation of CTC-CRFs and complete training and testingscripts for a number of English and Chinese benchmarks. Experiments show CATobtains state-of-the-art results, which are comparable to the fine-tuned hybridmodels in Kaldi but with a much simpler training pipeline. Compared to existingnon-modularized E2E models, CAT performs better on limited-scale datasets,demonstrating its data efficiency. Furthermore, we propose a new method calledcontextualized soft forgetting, which enables CAT to do streaming ASR withoutaccuracy degradation. We hope CAT, especially the CTC-CRF based framework andsoftware, will be of broad interest to the community, and can be furtherexplored and improved.
What problem does this paper attempt to address?