Reconstruction of Pitch for Whisper-to-speech Conversion of Chinese

Jingjie Li,Ian Vince McLoughlin,Yan Song
DOI: https://doi.org/10.1109/iscslp.2014.6936709
2014-01-01
Abstract:Whispers are a common and necessary secondary vocal communications mechanism for natural human-to-human dialogue. They are also the primary communications mechanism for many suffering from aphonia, such as laryngectomees. For typical speakers, whispering is a predominantly contextual activity, prompted by either the sensitive nature of information being conveyed or in response to environmental considerations. Given the importance of whispers, especially for tonal languages like Chinese, and the fact that many communications systems assume vocalised speech, much work has been directed towards the conversion of whispers into natural sounding speech. Since pitch information is largely absent in whispers, it is this key f0 information which needs to be supplied during the regeneration process, and which is the focus of much research. GMM-based reconstruction techniques have proven effective at whisper reconstruction, and some recent work has proposed the use of artificial pitch derived from formant harmonics as an alternative. This paper describes a new formulation of the formant-harmonic f0 method, and compares this directly against a novel GMM-based f0 estimator, as well as known correct pitch excitation for parallel utterances.
What problem does this paper attempt to address?