Automatic Prosodic Boundary Labeling Based on Fusing the Silence Duration with the Lexical Features

FU Ruibo,TAO Jianhua,LI Ya,WEN Zhengqi
DOI: https://doi.org/10.16511/j.cnki.qhdxxb.2018.21.003
2018-01-01
Abstract:Automatic prosodic boundary labeling is important in the construction of a speech corpus for speech synthesis.Automatic labeling of prosodic boundaries gives more consistent results than manual labeling of prosodic boundaries which is time consuming and inconsistent.Manual labeling method is modelled here using a recurrent neural network to train two sub-models which use lexical features and acoustic features to label the prosodic boundaries.Model fusion is then used to combine the outputs of the two sub-models to obtain the optimal labeling results.The silence durations for each word give clearer physical meanings and better correlations with the prosodic boundaries than the acoustic features used in traditional methods extracted frame-by-frame.Tests show that the silence durations extracted using the current acoustic features and the model fusion method improve the prosodic boundary labeling compared with previous feature fusion methods.
What problem does this paper attempt to address?