Combining Information from Multi-Stream Features Using Deep Neural Network in Speech Recognition

Pan Zhou,Lirong Dai,Qingfeng Liu,Hui Jiang
DOI: https://doi.org/10.1109/icosp.2012.6491549
2012-01-01
Abstract:The subject of the paper is the integration of multi-stream features in the framework of hybrid artificial neural network (ANN) - hidden Markov model (HMM). We investigate the use of log filter bank and MFCC features in multi-stream combination for phoneme recognition. An intermediate integration method is proposed to fuse the information from different sets of features. By exploiting deep learning algorithm to train the deep neural network (DNN), we explore different stream combination methods. Results of recognition experiments using DNN-HMM system on the TIMIT speech data show that the proposed approach is not only superior to the single best stream, which is relative 6.1% phone error rate (PER) reduction, but outperforms the other fusion strategies as well.
What problem does this paper attempt to address?