Robust speech source 3D localization method based on discrete time delay

Weiping Cai,Zhenyang Wu
2009-01-01
Abstract:To reduce the computation load of the steered response power-phase transform (SRP-PHAT) which is a robust speech source localization algorithm, an improved SRP-PHAT algorithm based on discrete time delay is presented in this paper. In this method, a frame of signal from microphone arrays is transformed into frequency domain by FFT (fast Fourier transform), then the sample points increase by 16 times by padding zeros in frequency domain. As a result, a generalized cross-correlation (GCC) of higher sampling rate can be achieved by taking IFFT (inverse fast Fourier transform). All the GCCs can be calculated before searching; the computation load will be significantly reduced. Moreover, the localization errors introduced by discrete time delay are small enough to ignore because of the high sampling rate of GCC. Simulation results show that the method can save computation load by one order of magnitude, while still remaining robust in both far-field and near-field.
What problem does this paper attempt to address?