Abstract:This paper addresses the problem of automatically labeling focus word pairs in spontaneous spoken English, where a focus word pair refers to salient part of text or speech and the word motivating it. The prediction of focus word pairs is important for speech applications such as expressive text-to-speech (TTS) synthesis and speech recognition. It can also help in better textual and intention understanding for spoken dialog systems. Traditional approaches such as support vector machines (SVMs) prediction neglect the dependency between words and meet the obstacle of the imbalanced distribution of positive and negative samples of dataset. This paper introduces conditional random fields (CRFs) to the task of automatically predicting focus word pair from lexical, syntactic and semantic features. Furthermore, several new features related to syntactic and semantic information are proposed to achieve better performance. Experiments on the publicly available Switchboard corpus demonstrate that CRF model outperforms the baseline and SVM model for focus word pair prediction, and newly proposed features can further improve performance for CRF based predictor. Specifically, compared to the low recall rate of 11.31% achieved by the SVM model, the proposed CRF based predictor can yield a high recall rate of 70.88% with little impact on precision.

Prosodic Structure Prediction Based on Conditional Random Field Model

Chinese Prosodic Word Prediction Using the Conditional Random Fields.

A Two-stage Prosodic Structure Generation Strategy for Mandarin Text-to-speech Systems

Statistical Model Based on Probability Frequency for Mandarin Prosodic Structure Prediction

Exploiting Prosodic and Lexical Features for Tone Modeling in A Conditional Random Field Framework

Blstm-Crf Based End-To-End Prosodic Boundary Prediction With Context Sensitive Embeddings In A Text-To-Speech Front-End

Mandarin prosodic word prediction using dependency relationships

Prosodic boundary prediction based on maximum entropy model with error-driven modification

Prosodic Structure Prediction Using Deep Self-attention Neural Network

Mongolian prosodic phrase prediction using suffix segmentation

Automatic prosody prediction for Chinese speech synthesis using BLSTM-RNN and embedding features

Prosodic Annotation Enriched Statistical Machine Translation

Statistic Prosody Structure Prediction

Rule-learning Based Prosodic Structure Prediction

Predicting Chinese Prosodic Word Based on Transformation-Based Error-Driven Learning

Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information

A Character-level Span-based Model for Mandarin Prosodic Structure Prediction

Prosodic Word Boundaries Prediction for Mandarin Text-to-Speech

Learning rules for Chinese prosodic phrase prediction

Using Conditional Random Fields to Predict Focus Word Pair in Spontaneous Spoken English

Pitch Prediction for Mandarin TTS with Mutual Prosodic Constraint