Speech recognition method based on DNN-LSTM combined with Wiener filtering algorithm

Keliang Song,Tianyu Zhu,Haoyan Pei
DOI: https://doi.org/10.1109/ICCASIT55263.2022.9987143
2022-10-12
Abstract:Speech activity detection (VAD) algorithms based on deep neural networks (DNNs) ignore the temporal correlation of acoustic features between speech frames, which greatly reduces the performance in noisy environments. To solve this problem, this paper proposes a hybrid network structure based on deep neural network (DNN) and long short-term memory (LSTM), combining the nonlinear learning ability and long sequence node analysis ability of both to learn the dynamic changes of speech signals over time, and optimizing them with wavelet transform and BPTT algorithms. Meanwhile, the signal processing framework is combined with the Wiener filtering algorithm to cope with the untrained noise types in deep learning. Compared with the separate deep learning network and speech signal processing system, the DNN-LSTM-Wiener model has better acoustic modeling ability and speech recognition ability in realistic environments. The study uses the TIMIT corpus for experiments to compare with traditional acoustic models. The experimental results show that the utterance error rate of DNN-LSTM model combined with Wiener filtering algorithm decreases to 21.68%, which is more advantageous in recognition accuracy and still has accurate detection ability at lower signal-to-noise ratio.
Engineering,Computer Science
What problem does this paper attempt to address?