An Open-Source Library of 2D-GMM-HMM Based on Kaldi Toolkit and Its Application to Handwritten Chinese Character Recognition

Jiefeng Ma,Zirui Wang,Jun Du
DOI: https://doi.org/10.1007/978-3-030-87355-4_20
2021-01-01
Abstract:As an open source toolkit based on 1D-HMM framework, Kaldi toolkit is widely used in many signal processing tasks. However, when dealing with complex spatial structures, e.g. in image related tasks, 2D-HMM is more suitable since it allows free transition between hidden states in both horizontal and vertical directions. Although 2D-HMM framework has been proposed for years, there is still a lack of efficient open source toolkit for further research due to its complexity. In this paper we present a highly efficient code library of 2D-GMM-HMM based on Kaldi toolkit with implementation details. As a demonstration of its effectiveness, we apply 2D-GMM-HMM to handwritten Chinese character recognition (HCCR) task. The experiments on a 50-class HCCR task have proved that the 2D-GMM-HMM system has obvious advantages over the 1D-GMM-HMM system in terms of recognition accuracy and modeling precision. Moreover, the visual analysis shows that 2D-GMM-HMM can well segment the Chinese characters into basic components such as radicals via the hidden states in both horizontal and vertical directions while 1D-GMM-HMM can only conduct the segmentation in the horizontal direction. The project code of 2D-GMM-HMM library and its recipe on HCCR is publicly available at https://github.com/jfma-USTC/2DHMM.
What problem does this paper attempt to address?