PENTATrainer2: A Hypothesis-Driven Prosody Modeling Tool

Santitham Prom-on,Yi Xu
DOI: https://doi.org/10.36505/exling-2012/05/0024/000230
2019-01-01
Abstract:Prosody is an essential aspect of speech, as it carries both lexical and non-lexical information. A conventional approach for studying speech prosody is to collect and analyze F0 data based on certain hypotheses and then develop a theory based on the observation as the final conclusion of the study. This process is however far from complete, as the developed theory has not been actually tested for its ability to predict actual acoustic data. This paper presents PENTATrainer2, a prosody modeling tool based on the parallel encoding and target approximation framework. PENTATrainer2 can facilitate prosody studies in testing hypotheses and theories against speech data by using an automatic analysis-by-synthesis and stochastic learning algorithm. Users can flexibly design the annotation scheme based on their own hypotheses and determine whether the hypothesized categories can lead to accurate synthetic F0 contours. PENTATrainer2 consists of three main components: multi-layer annotation, target approximation, and stochastic optimization. First, acoustic data are annotated in parallel layers, each of which corresponds to a functional category that may affect F0 contours. These layers are then compiled into unique functional combinations. The combinations represent underlying invariant representations of communicative functions and their interaction with each other. Target approximation parameters of each combination are then learned through analysis-by-synthesis and stochastic optimization. Pilot tests of PENTATrainer 2 have been conducted on Thai, Mandarin and English. The results demonstrate not only high accuracy of the synthesized F0 contours but also distinctive contrasts in the distribution of pitch target parameters. This indicates the effectiveness of PENTATrainer2 in modeling speech prosody.
What problem does this paper attempt to address?