A Novel Training Strategy Using Dynamic Data Generation for Deep Neural Network Based Speech Enhancement.
Mao-Kui He,Jun Du,Zi-Rui Wang,Lei Sun
DOI: https://doi.org/10.23919/APSIPA.2018.8659607
2018-01-01
Abstract:In this paper, a new training strategy is proposed to address the key issue in deep neural network (DNN) based speech enhancement: how to effectively utilize the limited data with a growing awareness of the necessity to increase training data in the deep learning era. Traditionally, a fixed training set consisting of a large amount of paired utterances, i. e., clean speech and corresponding noisy speech, must be prepared in advance. However, it seems inevitable to enlarge noisy speech in the training stage for making model adaptive to various noise environments. Besides, involving more training data leads to longer training time as the fixed training set should be repeated for multiple epochs. In this study, we propose a novel training method via dynamic data generation. The key idea is the synthetic phase of noisy speech data is conducted on the fly from utterance level to the batch level. Three advantages are gained from this new training method. First, by dynamic generation of training data batch, it is not necessary to prepare and store the fixed training set as in the conventional training method. Second, with the same training time as in the conventional method, more abundant noisy data are actually fed into DNN model. Finally, different evaluation measures, including PESQ, STOI, LSD, and SegSNR, can be consistently improved for the unseen noise types, demonstrating the better generalization capability of the proposed training strategy.