Speech Recognition Based on Deep Tensor Neural Network and Multifactor Feature.

Yahui Shan,Min Liu,Qingran Zhan,Shixuan Du,Jing Wang,Xiang Xie
DOI: https://doi.org/10.1109/apsipaasc47483.2019.9023251
2019-01-01
Abstract:This paper presents a speech recognition system based on deep tensor neural network which uses multifactor feature as input feature of acoustic model. First, a deep neural network is trained to estimate articulatory feature from input speech, where the training data is MOCHA database[1]. Mel frequency cepstrum coefficients in conjunction with articulatory feature are used as multifactor feature. Deep tensor neural network which involves tensor interactions among neurons is used as the acoustic model in this system. Speech recognition results indicate that the multifactor feature helps in improving speech recognition performance not only under clean conditions but also under noisy background conditions; deep tensor neural network is more capable of modeling multifactor features because of its tensor interactions than deep neural network.
What problem does this paper attempt to address?