Whisper-to-speech Conversion Using Restricted Boltzmann Machine Arrays

Jing-Jie Li,Ian V. Mcloughlin,Li-Rong Dai,Zhen-Hua Ling
DOI: https://doi.org/10.1049/el.2014.1645
2014-01-01
Electronics Letters
Abstract:Whispers are a natural vocal communication mechanism, in which vocal cords do not vibrate normally. Lack of glottal-induced pitch leads to low energy, and an inherent noise-like spectral distribution reduces intelligibility. Much research has been devoted to processing of whispers, including conversion of whispers to speech. Unfortunately, among several approaches, the best reconstructed speech to date still contains obviously artificial muffles and suffers from an unnatural prosody. To address these issues, the novel use of multiple restricted Boltzmann machines (RBMs) is reported as a statistical conversion model between whisper and speech spectral envelopes. Moreover, the accuracy of estimated pitch is improved using machine learning techniques for pitch estimation within only voiced (V) regions. Both objective and subjective evaluations show that this new method improves the quality of whisper-reconstructed speech compared with the state-of-the-art approaches.
What problem does this paper attempt to address?