Automatic Prosodic Structure Labeling using DNN-BGRU-CRF Hybrid Neural Network.

Yao Du,Zhiyong Wu,Shiyin Kang,Dan Su,Dong Yu,Helen Meng
DOI: https://doi.org/10.1109/APSIPAASC47483.2019.9023299
2019-01-01
Abstract:The speech corpus with labeled prosodic structure information is crucial for text-to-speech (TTS) synthesis to train a reliable model that can generate high quality natural synthetic speech. Traditional manual prosodic structure labeling is laborious and time-consuming and may encounter an inconsistency problem caused by different annotators. Automatic prosodic labeling is thus desirable, which can not only speed up the labeling process, but also protect the labeling results from the inconsistency problem. This paper presents a DNN-BGRU-CRF hybrid neural network, which aggregates the advantages of deep neural network, bidirectional gated recurrent units and conditional random fields, to label three-level prosodic structure boundaries. It exploits both text and acoustic cues in a neural network framework. Experimental results demonstrate the effectiveness of the proposed model.
What problem does this paper attempt to address?