A Study on the Robustness of Pitch Range Estimation from Brief Speech Segments

Wenjie Peng,Kaiqi Fu,Wei Zhang,Yanlu Xie,Jinsong Zhang
DOI: https://doi.org/10.1109/ialp48816.2019.9037713
2020-01-01
International Journal of Asian Language Processing
Abstract:Pitch range estimation from brief speech segments is important for many tasks like automatic speech recognition. To address this issue, previous studies have proposed to utilize deep-learning-based models to estimate pitch range with spectrum information as input [1-2]. They demonstrated it could still achieve reliable estimation results when speech segment is as brief as 300ms. In this work, we further investigate the robustness of this method. We take the following situation into account: 1) increasing the number of speakers for model training hugely; 2) second-language(L2) speech data; 3) the influence of monosyllabic utterances with different tones. We conducted experiments accordingly. Experimental results showed that: 1) We further improved the accuracy of pitch range estimation after increasing the speakers for model training. 2) The estimation accuracy on the L2 learners is similar to that on the native speakers. 3) Different tonal information has an influence on the LSTM-based model, but this influence is limited compared to the baseline method. These results may contribute to speech systems that demanding pitch features.
What problem does this paper attempt to address?