Learning Dynamic Features with Neural Networks for Phoneme Recognition

Xin Zheng,Zhiyong Wu,Helen Meng,Lianhong Cai
DOI: https://doi.org/10.1109/icassp.2014.6854055
2014-01-01
Abstract:Dynamic features such as delta and delta-delta of basic acoustic features have long been used in various speech applications and give satisfactory performance. The explicit physical meaning and simplicity of dynamic features clearly compound their prevalence. In this paper, we propose a new framework with neural network to learn the alternatives of traditional delta and higher order differences. Instead of embracing the interpretability and simplicity, our framework is able to learn a new transformation that simulates what differences do but is more relevant to a specific task such as phoneme recognition. We determine the best way to learn such a new transformation among several most probable alternatives. Our experiments indicate that dynamic features obtained with transformation learned this way are better than traditional differences in both frame classification and phoneme recognition. The improvement of performance is even clearer when higher-order of differences are applied.
What problem does this paper attempt to address?