Abstract:In concatenative based speech synthesis, the purpose of unit selection is to select proper speech units from speech corpus by measuring how well the selected units match the given features. Perceptual test indicates that some features are always preferred to make perceptual distinction between units. Such features should be judged prior to others in unit selection. In this work, we attempt to identify the priorities for different features and try to optimize the unit selection with perceptual clustering. Out approach first clusters the speech units with hierarchical clustering based on a perceptual distance measurement between different speech units. A method to identify the questions (concerning the features) is then proposed to build the decision tree from the clustering result. The features used in the decision tree are the preferred ones, and the other features are used in the target cost function. Linear discriminant analysis (LDA) is then adopted to train the weights for the target cost function from the clustering result to make weights more reasonable and perceptual related.. Experimental results indicate that the optimized unit selection can generate synthetic speech with higher naturalness than the previous approach.

Perceptual Clustering Based Unit Selection Optimization for Concatenative Text-to-speech Synthesis

A novel unit selection method for concatenation speech system using similarity measure

Selecting optimal non-uniform units for hierarchical unit selection

Perceptual Evaluation Weight Training for Text-to-Speech Synthesis

Statistical Acoustic Model Based Unit Selection Algorithm for Speech Synthesis

Context features based pre-selection and weight prediction in concatenation speech synthesis system

HMM-Based Hierarchical Unit Selection Combining Kullback-Leibler Divergence with Likelihood Criterion

Hierarchical Non-Uniform Unit Selection Based on Prosodic Structure

HMM-based Unit Selection Using F

HMM-based Unit Selection Speech Synthesis Using Log Likelihood Ratios Derived from Perceptual Data

Improved unit selection speech synthesis method utilizing subjective evaluation results on synthetic speech

Unit Selection Speech Synthesis Integrating Automatic Error Detection

Trainable Unit Selection Speech Synthesis under Statistical Framework

Hybrid Unit Model Based Non-uniform Unit Selection

A data driven method for target and concatenation cost calculation with KL-Divergence in Mandarin hybrid speech synthesis

HMM-BASED HIERARCHICALUNITSELECTIONCOMBINING KULLBACK-LEIBLER DIVERGENCE WITH LIKELIHOODCRITERION

A Decision Tree Based Approach for Construction of Speech Database and Unit Selection

Stable boundary-based non-uniform unit selection in speech synthesis

Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation

DNN-based unit selection using frame-sized speech segments

A Novel Hybrid Mandarin Speech Synthesis System Using Different Base Units for Model Training and Concatenation