Abstract:Prosody is an essential aspect of speech, as it carries both lexical and non-lexical information. A conventional approach for studying speech prosody is to collect and analyze F0 data based on certain hypotheses and then develop a theory based on the observation as the final conclusion of the study. This process is however far from complete, as the developed theory has not been actually tested for its ability to predict actual acoustic data. This paper presents PENTATrainer2, a prosody modeling tool based on the parallel encoding and target approximation framework. PENTATrainer2 can facilitate prosody studies in testing hypotheses and theories against speech data by using an automatic analysis-by-synthesis and stochastic learning algorithm. Users can flexibly design the annotation scheme based on their own hypotheses and determine whether the hypothesized categories can lead to accurate synthetic F0 contours. PENTATrainer2 consists of three main components: multi-layer annotation, target approximation, and stochastic optimization. First, acoustic data are annotated in parallel layers, each of which corresponds to a functional category that may affect F0 contours. These layers are then compiled into unique functional combinations. The combinations represent underlying invariant representations of communicative functions and their interaction with each other. Target approximation parameters of each combination are then learned through analysis-by-synthesis and stochastic optimization. Pilot tests of PENTATrainer 2 have been conducted on Thai, Mandarin and English. The results demonstrate not only high accuracy of the synthesized F0 contours but also distinctive contrasts in the distribution of pitch target parameters. This indicates the effectiveness of PENTATrainer2 in modeling speech prosody.

PENTATrainer2: A Hypothesis-Driven Prosody Modeling Tool

Modeling speech melody as communicative functions with PENTAtrainer2

Modelling Japanese Intonation Using PENTAtrainer2.

Toward Invariant Functional Representations of Variable Surface Fundamental Frequency Contours: Synthesizing Speech Melody Via Model-Based Stochastic Learning

Explaining the PENTA Model : A Reply to Arvaniti & Ladd ( 2009 )

Modeling Pitch Contour of Chinese Mandarin Sentences with the PENTA Model

The Common Prosody Platform (CPP): Where Theories of Prosody Can Be Directly Compared

A Syllable-Based Prosody Modeling for L1 and L2 English Speeches

Evaluating Prosody of Mandarin Spe

Modeling Tone and Intonation in Mandarin and English As a Process of Target Approximation.

HIERARCHICAL PROSODY MODELING FOR NON-AUTOREGRESSIVE SPEECH SYNTHESIS

Modeling Prosody Patterns for Chinese Expressive Text-to-speech Synthesis

Experiment on pitch target approximation model for generating Mandarin F0 contour

Adaptive Filter Based Prosody Modification Approach

Intonation and Prosody Conversion for Expressive Mandarin Speech Synthesis

A Novel Method for Mandarin Speech Synthesis by Inserting Prosodic Structure Prediction into Tacotron2.

Improving Prosody for Unseen Texts in Speech Synthesis by Utilizing Linguistic Information and Noisy Data

Two-Stage Prosody Prediction For Emotional Text-To-Speech Synthesis

Prosody Learning Mechanism for Speech Synthesis System Without Text Length Limit

Synthesizing Expressive Speech to Convey Focus using a Perturbation Model for Computer-Aided Pronunciation Training

ProsodyFM: Unsupervised Phrasing and Intonation Control for Intelligible Speech Synthesis