Structure Growth for Small-Footprint Speech Recognition

Jiayao Wu,Zhiyuan Tang,Dong Wang
DOI: https://doi.org/10.1109/APSIPAASC47483.2019.9023275
2019-01-01
Abstract:Modern speech recognition (ASR) is based on large-scale deep neural nets (DNNs) with various architectures. For small-footprint applications running on low-power chips, however, the size of the DNNs must be extremely constrained. In this case, training a generalizable acoustic model is not feasible, especially when the acoustic conditions are diverse. Most of existing approaches to small-footprint networks start from a large net and reduce its scale by pruning. In this paper, we investigate a reverse idea: starting from a small net and increasing it gradually. This structure-growth approach follows a 'general to specific' principle and grows the net gradually. We start from the AdaBoost algorithm that builds specific nets for error-prone data, and then propose a new ConBoost that builds specific nets for specific conditions. Our experiments on a small-footprint ASR task demonstrated that both AdaBoost and ConBoost outperform the baseline and other comparative methods including bagging and double-net retraining. Furthermore, ConBoost performs better than AdaBoost.
What problem does this paper attempt to address?