The Method of Disentangled and Interpretable Representations for Speech Enhancement

Kun Zhao,Yuanhang Yang,Yutian Wang,Hui Wang
DOI: https://doi.org/10.1109/IAEAC50856.2021.9390768
2021-01-01
Abstract:We study the problem of speech enhancement in monophonic and variety noisy conditions. Most speech enhancement models have successfully mapped noisy speech features to clean speech features through the deep network. However, analyzation of sequence data multi-scale information still has great challenges, which is important for the speech signal. In this paper, we present an end-to-end hierarchical model , combining factorized hierarchical autoencoder (FHVAE) and independently recurrent neural network (indRNN) structures, which learns disentangled and interpretable representations from speech data in unsupervision way. Compared to other end-to-end models, this encoder can be used to capture the features of speech linguistic and background noise, then encode them into latent variables at different levels. And speech enhancement is achieved by manipulating latent variables of noise conditions. The objective evaluation results demonstrate that the proposed model has better improvements than classic methods, and the training time is shorter than the others.
What problem does this paper attempt to address?