Speech Enhancement Based on Full-Sentence Correlation and Clean Speech Recognition

Ming Ji,Danny Crookes
DOI: https://doi.org/10.1109/taslp.2017.2651406
2017-01-01
Abstract:Conventional speech enhancement methods, based on frame, multiframe, or segment estimation, require knowledge about the noise. This paper presents a new method that aims to reduce or effectively remove this requirement. It is shown that by using the zero-mean normalized correlation coefficient (ZNCC) as the comparison measure, and by extending the effective length of speech segment matching to sentence-long speech utterances, it is possible to obtain an accurate speech estimate from noise without requiring specific knowledge about the noise. The new method, thus, could be used to deal with unpredictable noise or noise without proper training data. This paper is focused on realizing and evaluating this potential. We propose a novel realization that integrates full-sentence speech correlation with clean speech recognition, formulated as a constrained maximization problem, to overcome the data sparsity problem. Then we propose an efficient implementation algorithm to solve this constrained maximization problem to produce speech sentence estimates. For evaluation, we build the new system on one training dataset and test it on two different test datasets across two databases, for a range of different noises including highly nonstationary ones. It is shown that the new approach, without any estimation of the noise, is able to significantly outperform conventional methods that use optimized noise tracking, in terms of various objective measures including automatic speech recognition.
What problem does this paper attempt to address?