Abstract:The multi-objective learning using minimum mean squared error criterion for DNN-based speech enhancement (MMSE-MOL-DNN) has been demonstrated to achieve better performance than single output DNN. However, one problem of MMSE-MOL-DNN is that the prediction error values on different targets have a very broad dynamic range, causing difficulty in DNN training. In this paper, we extend the maximum likelihood approach proposed in our previous work [1] to the multi-objective learning for DNN-based speech enhancement (ML-MOL-DNN) to achieve the automatic adjustment of the dynamic range of prediction error values on different targets. The conditional likelihood function to be maximized is derived under the generalized Gaussian distribution (GGD) error model. Moreover, the control of the dynamic range of the prediction error values on different targets is achieved by the scale factors in GGD. Furthermore, we propose a method to update the shape factors automatically utilizing the one-to-one mapping between the kurtosis and shape factor in GGD instead of manual adjustment. The experimental results show that our ML-MOL-DNN can achieve better performance than MMSE-MOL-DNN in terms of different objective measures.

A combined model of statics-dynamics of speech optimized using maximum mutual information

A New Combined Model of Statics-Dynamics of Speech.

From Linear Prediction HMM to a New Combined Model for Speech Recognition

HMM training method based on evolutionary computation and MDI in speech recognition

Maximum Likelihood I-Vector Space Using PCA for Speaker Verification.

Discriminative Dynamic Gaussian Mixture Selection with Enhanced Robustness and Performance for Multi-Accent Speech Recognition

Discriminative Combination of Multiple Linear Predictions for Speech Recognition.

Parametric model of introducing inter-frame correlation information into hidden markov model for speech recognition

Improvement of hidden Markov model (HMM) for speech recognition

An equivalent-class based MMI learning method for MGCPM

Duration-Distribution-Based HMM for Speech Recognition

A Maximum Likelihood Approach to Multi-Objective Learning Using Generalized Gaussian Distributions for Dnn-Based Speech Enhancement.

An inhomogeneous HMM speech recognition algorithm

Improved HMM Model Using Spatial Correlation

An New Method Used in HMM for Modeling Frame Correlation.

Discriminative Speaker Adaptation with Eigenvoices

Partial-tied-mixture Auxiliary Chain Models for Speech Recognition Based on Dynamic Bayesian Networks

A Speech Recognition System Based on a Hybrid HMM/SVM Architecture

Probabilistic Speaker-Class Based Acoustic Modeling for Large Vocabulary Continuous Speech Recognition

Combining HMM and SPSM for Sign Language Recognition

Statistical modeling of syllable-level F0 features for HMM-based unit selection speech synthesis