Statistical Models For Dealing With Discontinuity Of Fundamental Frequency

Kai Yu
DOI: https://doi.org/10.1007/978-3-662-45258-5_9
2015-01-01
Abstract:The accurate modelling of fundamental frequency, or F0, in HMM-based speech synthesis is a critical factor for achieving high quality speech. However, it is also difficult because F0 values are normally considered to depend on a binary voicing decision such that they are continuous in voiced regions and undefined in unvoiced regions. Namely, estimated F0 value is a discontinuous function of time, whose domain is partly continuous and partly discrete. This chapter investigates two statistical frameworks to deal with the discontinuity issue of F0. Discontinuous F0 modelling strictly defines probability of a random variable with discontinuous domain and model it directly. Awidely used approach within this framework is multispace probability distribution (MSD). An alternative framework is continuous F0 modelling, where continuous F0 observations are assumed to always exist and voicing classification is modelled separately. Both theoretical and experimental comparisons of the two frameworks are given.
What problem does this paper attempt to address?