Robust F0 Modeling for Mandarin Speech Recognition in Noise.

Sheng Qiang,Yao Qian,Frank K. Soong,Congfu Xu
DOI: https://doi.org/10.21437/interspeech.2007-503
2007-01-01
Abstract:The F0 contour plays an important role in recognizing spoken tonal languages like Mandarin Chinese. However, the discontinuity of F0 between voiced and unvoiced transition has traditionally been a bottleneck in creating a succinct statistical tone model for automatic speech recognition applications. By applying successfully the Multi-Space Distribution (MSD) to tone modeling, we recently reported a relative 24% reduction of tonal syllable errors on a Mandarin speech database. In this paper, we test MSD further in a noisy, continuous Mandarin digit recognition task, where eight types of noises are added to clean speech signals at five SNRs. The experimental results show that our MSD-based digit models can significantly improve the recognition performance in noise over a baseline system. Relative digit error rate reductions of 19.1% and 15.0% are obtained for noises seen and unseen in the training data, respectively. The improvements are also better than other reference systems where F0 information is incorporated.
What problem does this paper attempt to address?