Convolutional maxout neural networks for low-resource speech recognition

Meng Cai,Yongzhe Shi,Jian Kang,Jia Liu,Tengrong Su
DOI: https://doi.org/10.1109/ISCSLP.2014.6936676
2014-01-01
Abstract:Building speech recognition systems with limited data resources is a fast progressing topic. In this paper, we propose the convolutional maxout neural network acoustic model for low-resource speech recognition. There are three motivations for this model. The first is to make use of the prior knowledge of local speech spectrum features by applying the convolutional structures. The second is to shrink the model size and enable better optimization performance by using the maxout nonlinearity. The third is to enhance model generalization and control overfitting by applying the dropout training. All the three motivations compensate for the lack of training data. Experiments on a 24-hour subset of the Switchboard corpus show that the convolutional structure, the maxout nonlinearity and the dropout training all bring superior performances on this task, and the combination of the three technologies achieves over 10.0% relative improvements over a convolutional neural network baseline.
What problem does this paper attempt to address?