Scaling Speech Enhancement in Unseen Environments with Noise Embeddings

Gil Keren,Jing Han,Björn Schuller
DOI: https://doi.org/10.48550/arXiv.1810.12757
2018-10-26
Abstract:We address the problem of speech enhancement generalisation to unseen environments by performing two manipulations. First, we embed an additional recording from the environment alone, and use this embedding to alter activations in the main enhancement subnetwork. Second, we scale the number of noise environments present at training time to 16,784 different environments. Experiment results show that both manipulations reduce word error rates of a pretrained speech recognition system and improve enhancement quality according to a number of performance measures. Specifically, our best model reduces the word error rate from 34.04% on noisy speech to 15.46% on the enhanced speech. Enhanced audio samples can be found in <a class="link-external link-https" href="https://speechenhancement.page.link/samples" rel="external noopener nofollow">this https URL</a>.
Audio and Speech Processing,Machine Learning,Sound
What problem does this paper attempt to address?