HMM-neural network monophone models for computer-based articulation training for the hearing impaired
M. Devarajan,Fansheng Meng,P. Hix,S. Zahorian
DOI: https://doi.org/10.1109/ICASSP.2003.1202373
2003-07-06
Abstract:A visual speech training aid for persons with hearing impairments has been developed using a Windows-based multimedia computer. Previous papers (Zahorian, S. et al., Int. Conf. on Spoken Language Processing, 2002; Zahorian and Nossair, Z.B., IEEE Trans. on Speech and Audio Processing, vol.7, no.4, p.414-25, 1999; Zimmer, A. et al., ICASSP, vol.6, p.3625-8, 1998; Zahorian and Jagharghi, A., J. Acoust. Soc. Amer., vol.94, no.4, p.1966-82, 1993) have describe the signal processing steps and display options for giving real-time feedback about the quality of pronunciation for 10 steady-state American English monopthong vowels (/aa/, /iy/, /uw/, /ae/, /er/, /ih/, /eh/, /ao/, /ah/, and /uh/). This vowel training aid is thus referred to as a vowel articulation training aid (VATA). We now describe methods to develop a monophone-based hidden Markov model/neural network recognizer such that real time visual feedback can be given about the quality of pronunciation of short words and phrases. Experimental results are reported which indicate a high degree of accuracy for labeling and segmenting the CVC (consonant-vowel-consonant) database developed for "training" the display.