Auditory Features Based on Gammatone Filters for Robust Speech Recognition.

Jun Qi,Dong Wang,Yi Jiang,Runsheng Liu
DOI: https://doi.org/10.1109/iscas.2013.6571843
2013-01-01
Abstract:A major challenge for automatic speech recognition (ASR) relates to significant performance reduction in noisy environments. Recent research has shown that auditory features based on Gammatone filters are promising to improve robustness of ASR systems against noise, though the research is far from extensive and generalizability of the new features is unknown. This paper presents our implementation of the Gamma-tone filter-based feature and the experimental results on Mandarin speech data. By some thorough designs, we obtained significant performance gains with the new feature in various noise conditions when compared with the widely used MFCC and PLP features. A particular novelty of our implementation is that the filter design is purely in the time domain. This means that the channel signals are obtained with a set of Gammatone filters applied directly on the speech signals in time domain, which is totally different from the commonly adopted frequency-domain design that first converts signals to spectra and then applies the filter banks upon them. The time-domain implementation on the one hand avoids the approximation introduced by short-time spectral analysis and hence is more precise; and on the other hand, it avoids the complex spectral computation and hence simplifies hardware realization.
What problem does this paper attempt to address?