Investigation on dimensionality reduction of concatenated features with deep neural network for LVCSR systems

yebo bao,hui jiang,cong liu,yu hu,lirong dai
DOI: https://doi.org/10.1109/ICoSP.2012.6491550
2012-01-01
Abstract:The hybrid model, context-dependent deep neural network hidden Markov models (CD-DNN-HMMs), has received significant improvements on various challenging large vocabulary continuous speech recognition (LVCSR) tasks just in these few years. Recently, it is further reported that gains of DNN are almost entirely attributed to using features concatenated from consecutive speech frames as DNN's inputs. This result indicates that DNN has the excellent ability of well mining the high-dimensional features. But for GMM, we must resort to dimensionality reduction techniques to avoid the “curse of high-dimensionality”. In this paper, we attempt to derive compact and informative low-dimensional representations from concatenated features for GMM. Most simply, PCA is first considered about, but it doesn't work well in this situation. Then, we focus on investigating DNN-based bottleneck features. The experiments on a Mandarin LVCSR task and the Switchboard task both show that the recognition performance of GMM-HMMs trained with bottleneck features (BN-GMM-HMMs) can be comparable to that of CD-DNN-HMMs. Moreover, when discriminative training is leveraged, surprisingly it is observed that BN-GMM-HMMs provides nearly 8% relative error reductions over CD-DNN-HMMs on the Mandarin LVCSR task.
What problem does this paper attempt to address?