Abstract:Infants acquire words and phonemes from unsegmented speech signals using segmentation cues, such as distributional, prosodic, and co-occurrence cues. Many pre-existing computational models that represent the process tend to focus on distributional or prosodic cues. This paper proposes a nonparametric Bayesian probabilistic generative model called the prosodic hierarchical Dirichlet process-hidden language model (Prosodic HDP-HLM). Prosodic HDP-HLM, an extension of HDP-HLM, considers both prosodic and distributional cues within a single integrative generative model. We conducted three experiments on different types of datasets, and demonstrate the validity of the proposed method. The results show that the Prosodic DAA successfully uses prosodic cues and outperforms a method that solely uses distributional cues. The main contributions of this study are as follows: 1) We develop a probabilistic generative model for time series data including prosody that potentially has a double articulation structure; 2) We propose the Prosodic DAA by deriving the inference procedure for Prosodic HDP-HLM and show that Prosodic DAA can discover words directly from continuous human speech signals using statistical information and prosodic information in an unsupervised manner; 3) We show that prosodic cues contribute to word segmentation more in naturally distributed case words, i.e., they follow Zipf's law.

Improvement of Probabilistic Acoustic Tube model for speech decomposition

Incorporating AM-FM Effect in Voiced Speech for Probabilistic Acoustic Tube Model

Probabilistic Acoustic Tube: a Probabilistic Generative Model of Speech for Speech Analysis/synthesis

Use Of Particle Filtering And Mcmc For Inference In Probabilistic Acoustic Tube Model

A New Acoustic Modeling of Inter-Syllable Context-Dependent Units for Putonghua Continuous Speech Recognition

Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system

A New Combined Model of Statics-Dynamics of Speech.

Partial-tied-mixture Auxiliary Chain Models for Speech Recognition Based on Dynamic Bayesian Networks

Feature-Space Transform Tying in Unified Acoustic-Articulatory Modelling for Articulatory Control of HMM-Based Speech Synthesis.

Acoustic statistical modeling based new generation speech synthesis technology

Acoustic Modeling Based On Chinese Phonetics Knowledge

Integrating Articulatory Features into HMM-Based Parametric Speech Synthesis

Pan: Phoneme-Aware Network For Monaural Speech Enhancement

Double Articulation Analyzer with Prosody for Unsupervised Word and Phoneme Discovery

Improving Generalization of Speech Separation in Real-World Scenarios: Strategies in Simulation, Optimization, and Evaluation

Acoustic Statistical Modeling Based Speech Synthesis Technologies

Pitch-scaled Analysis Based Residual Reconstruction for Speech Analysis and Synthesis

Bayesian estimation of dissipation and sound speed in tube measurements using a transfer-function model

A Parametric Model for Voice Conversion

Improving HMM Based Speech Synthesis by Reducing Over-Smoothing Problems