Speaker Normalization Based on the Generalized Time-Frequency Representation and Mellin Transform

DM Jiang,RC Zhao
DOI: https://doi.org/10.1109/icosp.2000.891628
2000-01-01
Abstract:For vocal tract length normalization in speaker-independent speech recognition, a novel feature extraction method is carried out on the generalized time-frequency representation with cone-shaped kernel (CK-GTFR) and Mellin transform. The GTFR is superior to other representations in suppressing cross terms and producing good time and frequency resolution simultaneously. Mellin transform makes the features insensitive to different vocal tract lengths. F-ratio tests show that features in this paper have the highest separation ability compared to the FFT cepstrum or FFT-Mellin cepstrum, and are superior to the Mel cepstrum in most cases
What problem does this paper attempt to address?