Interword Coarticulation Modeling for Continuous Speech Recognition
Mei-Yuh Hwang,Hsiao-Wuen Hon,Kai-Fu Lee
DOI: https://doi.org/10.1121/1.2026700
1989-01-01
The Journal of the Acoustical Society of America
Abstract:In large-vocabulary continuous speech recognition, subword units must be used for practical reasons. Context-dependent phone models have become a very successful class of subword units. These phone-sized models take into account the neighboring phonetic contexts, which strongly affect the realization of a phone. However, previous approaches have only considered intraword coarticulation, and have ignored interword coarticulation, which is very important in continuous speech, especially for short function words like “the” and “a.” This study extends triphone-based modeling to interword coarticulation modeling. A simple extension of triphones is problematic due to the sharply growing number of triphones. In order to contain this growth, a maximum-likelihood clustering procedure was introduced to reduce 7057 intraword and interword triphones to 1000 generalized triphones. Interword generalized triphones were incorporated into a large-vocabulary, speaker-independent, continuous speech recognizer, SPHINX [K. F. Lee and H. W. Hon, Large Vocabulary Speaker-lndependent Continuous Speech Recognition (ICASSP, 1988)]. This improvement reduced the number of errors by as much as 44% on the 1000-word DARPA resource management task. This demonstrates the importance of interword coarticulation modeling, and the effectiveness of the methods used.