Improving F0 prediction using bidirectional associative memories and syllable-level F0 features for HMM-based Mandarin speech synthesis

Li Gao,Zhen-Hua Ling,Ling-Hui Chen,Li-Rong Dai
DOI: https://doi.org/10.1109/ISCSLP.2014.6936598
2014-01-01
Abstract:The speech generated by hidden Markov model (HMM) based speech synthesis method always sounds monotonous compared with natural recordings. An important reason is that the predicted F0 trajectories are over-smoothed. This arises from the adoption of frame-level F0 features and the averaging effect of acoustic modeling using Gaussians in the conventional F0 modeling approach. In this paper, we propose a method to improve the F0 prediction of HMM-based Mandarin speech synthesis in a post-filtering way. Syllable-level F0 features, e.g., length-normalized logF0 vectors or quantitative target approximation (qTA) parameters, are extracted from the F0 trajectories predicted by the conventional approach. These features are mapped towards natural ones by Gaussian bidirectional associative memory (GBAM) based transformation. Our subjective experiments indicate that the GBAM-based F0 post-filtering method using either logF0 vectors or qTA parameters can significantly improve the naturalness of synthetic speech. Using raw logF0 vectors for post-filtering can achieve better performance than using derived qTA parameters.
What problem does this paper attempt to address?