An Audio-Visual Whisper Database in Chinese

Jian Zhou,Yuting Hu,Hailun Lian,Cong Pang,Huabin Wang,Liang Tao
DOI: https://doi.org/10.1088/1742-6596/1237/2/022106
2019-01-01
Journal of Physics Conference Series
Abstract:Converting whisper to normal vocalized speech has been a hot research topic in speech signal processing area. A complete and large scale whisper database is a major basis for this task. In this paper, we propose a multimodal whisper database in Chinese mandarin. A total of 103 syllables and 100 sentences were carefully selected. 5 male and 5 female participants pronounced the syllables and sentences in whisper and normal styles respectively, result in 4096 parallel speech utterances and 263, 849 frames of voicing face and lip image sequences. The beginning and ending sample point of each syllable were labeled both for speech signal and voicing face video. The lip region of interest were also extracted and provided in the proposed database. Experiments in various speech conversion tasks in different speech database show the effectiveness of the proposed multimodal whisper speech database.
What problem does this paper attempt to address?