A C/v Segmentation Method for Mandarin Speech Based on Multiscale Fractal Dimension.

Fan Wang,Fang Zheng,Wenhu Wu
DOI: https://doi.org/10.21437/icslp.2000-895
2000-01-01
Abstract:This paper proposes a new algorithm for Mandarin speech Consonant and Vowel (C/V) segmentation based on the fractal theory. The new method focuses on searching the transient region between the Consonant and Vowel parts in a Mandarin syllable that in general is a concatenation of a consonant followed by a vowel. The Multiscale Fractal Dimension Set (MFD) stands for the fractal dimensions at multiple maximum resolutions of computation. Just using the r-variance of MFD (the degree of the difference from all elements of a MFD) to distinguish clearly between the stable phonemes and their transient region, the algorithm can directly search the speech frame with minimum r-variance of MFD as the C/V segmentation boundary. A result of 95.2% segmentation accuracy is obtained for clean test corpus, and 82.3% accuracy in noisy environment with the SNR of 10 dB. This shows that the new C/V segmentation algorithm is qualified for the task of continuous Mandarin speech recognition.
What problem does this paper attempt to address?