SPEAKER NORMALIZATION AND NOVEL ROBUST SPEECH FEATURE BASED ON MELLIN TRANSFORM

陈景东,徐波,黄泰翼
2000-01-01
ACTA AUTOMATICA SINICA
Abstract:One major source of interspeaker variability in speaker independent (SI) speech recognition is the variation of the vocal tract shape, especially the vocal tract length (VTL) among individual speakers. If the model of the vocal tract is assumed to be a uniform tube with length L , then the formant frequencies of utterances of a given sound are inversely proportional to L . Since the VTL can vary from approximately 13cm for females to over 18cm for males, formant center frequencies can vary by as much as 25% among speakers. This source of variability results in state of the art SI speech recognizers working poorly for outlier speakers whose vocal tract shapes differ significantly from those of speakers in the training set. In an effort to reduce the degradation in speech recognition performance caused by variation of the VTL among speakers, two methods are investigated in this paper. One is to remove the variability with a technique of speaker normalization. Another is to extract new feature based on the Mellin transform (MT). Because of the scale invariance property of the MT, the new feature is insensitive to variation of VTL among different speakers. Experiments show that both methods can improve the performance of an SI recognizer, while the latter approach is more effective than the former one.
What problem does this paper attempt to address?