Music removal by convolutional denoising autoencoder in speech recognition.

Mengyuan Zhao,Dong Wang,Zhiyong Zhang,Xuewei Zhang
DOI: https://doi.org/10.1109/APSIPA.2015.7415289
2015-01-01
Abstract:Music embedding often causes significant performance degradation in automatic speech recognition (ASR). This paper proposes a music-removal method based on denoising autoencoder (DAE) that learns and removes music from music-embedded speech signals. Particularly, we focus on convolutional denoising autoencoder (CDAE) that can learn local musical patterns by convolutional feature extraction. Our study shows that the CDAE model can learn patterns of music in different genres and the CDAE-based music removal offers significant performance improvement for ASR. Additionally, we demonstrate that this music-removal approach is largely language independent, which means that a model trained with data in one language can be applied to remove music from speech in another language, and models trained with multilingual data may lead to better performance.
What problem does this paper attempt to address?