Improved context-dependent acoustic modeling for continuous Chinese speech recognition

Jiyong Zhang,Thomas Fang Zheng,Jing Li,Chunhua Luo,Guoliang Zhang
DOI: https://doi.org/10.21437/eurospeech.2001-196
2001-01-01
Abstract:This paper describes the new framework of context-dependent (CD) Initial/Final (IF) acoustic modeling using the decision tree based state tying for continuous Chinese speech recognition. The Extended Initial/Final (XIF) set is chosen as the basic speech recognition unit (SRU) set according to the Chinese language characteristics, which outperforms the standard IF set. An adaptive mixture increasing strategy is applied when splitting the single Gaussian into mixed Gaussians in each tied state after the decision tree has been constructed. Our experimental results show that these two improvements are helpful to the acoustic modeling of Chinese speech recognition and that the CD XIF model outperforms the baseline syllable model over 30%.
What problem does this paper attempt to address?