Whisper to Normal Speech Based on Deep Neural Networks with MCC and F0 Features.

Hailun Lian,Yuting Hu,Jian Zhou,Huabin Wang,Liang Tao
DOI: https://doi.org/10.1109/icdsp.2018.8631888
2018-01-01
Abstract:In this paper, we propose a method of converting whisper to normal speech, using low dimensional Mel Cepstral Coefficients (MCC) combined with Deep Neural Networks (DNN). The whisper to normal speech conversion is divided into two modules, that is, spectrum conversion and fundamental frequency (F0) estimation. The MCC features are used to characterize the spectrum envelope. We use DNN to model relationship of low dimensional MCC between whispered speech and its normal counterpart. DNN can not only fit the data well, but can also tackle the issue of smoothness. In the module of F0 estimation, the F0 is estimated using MCC features both of normal speech and whisper. Specifically, the fundamental frequency of the voice and unvoiced speech frame are estimated simultaneously in order to reduce the modeling complexity. Experimental results show that the converted speech gains better performance both in the aspect of speech quality and intelligibility.
What problem does this paper attempt to address?