Distant Speech Recognition Based On Position Dependent Cepstral Mean Normalization

Lb Wang,N Kitaoka,S Nakagawa
2004-01-01
Abstract:In a distant environment, channel distortion may dramatically degrade speech recognition performance. In this paper, we propose a robust speech recognition method based on position dependent Cepstral Mean Normalization (CMN). At first the system measures the transmission characteristics according to the speaker positions from some grid points in the room a priori. In the recognition stage, the system estimates the speaker position in a 3-D space based on the time delay of arrival (TDOA) between distinct microphone pairs. And then the system selects the transmission characteristics estimated a priori corresponding to the estimated position and applies a channel distortion compensation method to the speech and recognizes it. In our proposed method, we also compensate the mismatch between the cepstral means of utterances spoken by human and those emitted from loudspeaker. Our experiments showed that the proposed method improved the performance of speech recognition system in a distant environment efficiently and it could also compensate the mismatch between voices from human and loudspeaker well.
What problem does this paper attempt to address?