Exploiting Prosodic and Lexical Features for Tone Modeling in A Conditional Random Field Framework

Hongxiu Wei,Xinhao Wang,Hao Wu,Dingsheng Luo,Xihong Wu
DOI: https://doi.org/10.1109/icassp.2008.4518668
2008-01-01
Abstract:Tonal cues play an important role in distinguishing ambiguous words in Mandarin speech recognition. This paper explores an innovative tone modeling framework using prosodic and lexical features, as well as syllable context information. A discriminative model, namely a Conditional Random Field (CRF), is adopted, which is sufficiently flexible to handle multiple interacting features and long-range dependencies of observations. After the first pass search of a recognition system, the CRF based tone models are employed to rerank N-best hypotheses according to the tonal scores which can represent the correctness of the tone sequence given each candidate hypothesis and the observed speech signal. Experiments results show that the tonal cues help to achieve 7.8% and 8.6% relative reductions of character error rate on two widely used Mandarin speech recognition tasks, Hub-4 test and 863 test.
What problem does this paper attempt to address?