Speech Feature Mapping based on Switching Linear Dynamic System

N. Kim,T. Kang,S. Kang,C. Han,D. Hong
DOI: https://doi.org/10.1109/tasl.2011.2163397
2011-01-01
IEEE Transactions on Audio, Speech, and Language Processing
Abstract:Signals originated from the same speech source usually appear differently depending on a variety of acoustic effects such as the background noises, linear or nonlinear distortions incurred by the recording devices or reverberations. These acoustical effects result in mismatches between the trained speech recognition models and the input speech. One of the well-known approaches to reduce this mismatch is to map the distorted speech feature to its clean counterpart. The mapping function is usually trained based on a set of stereo data which consists of the simultaneous recordings obtained in both the reference and target conditions. In this paper, we propose the switching linear dynamic system (SLDS) as a useful model for speech feature sequence mapping. In contrast to the conventional vector-to-vector mapping algorithms, SLDS can describe sequence-to-sequence mapping in a systematic way. The proposed approach is applied to robust speech recognition in various environmental conditions and shows a dramatic improvement in recognition performance.
English Else
What problem does this paper attempt to address?