Adaptation of Tandem Hidden Markov Models for Non-Speech Audio Event Detection.

Mark Hasegawa-Johnson,Xiaodan Zhuang,Xi Zhou,Camille Goudeseune,Thomas Huang
DOI: https://doi.org/10.1121/1.4784503
2009-01-01
The Journal of the Acoustical Society of America
Abstract:Non-speech audio event detection (AED) could be used for low-cost, spatially diffuse surveillance applications, e.g., monitoring of vehicle activity in a national park, or of footsteps in a hallway. Experiments have shown that non-speech AED benefits from the dynamic inference strategies such as the hidden Markov model (HMM), but that the acoustic features useful for non-speech events may not be the same as those useful for speech. One possible solution is a tandem HMM: an HMM whose observation vector is constructed from the output of an instantaneous discriminative classifier, e.g., a neural network. The use of tandem HMMs for non-speech AED is hindered, however, by the relatively small size of most non-speech-audio training corpora. This talk will demonstrate that tandem HMMs can be trained to detect non-speech audio events using a novel form of regularized training: Baum–Welch back-propagation (as proposed by Bengio et al.), using the conjugate-gradient adaptive form of the Baum–Welch auxiliary function (as proposed by Lee et al., and as commonly used in maximum a posteriori HMM adaptation).
What problem does this paper attempt to address?