A Two-Microphone Method for Localization of Multiple Speech Sources Using Complex Exponential Transform of Phase Differences

Xiaoyan Zhao,Jie Tang,Lin Zhou,Zhenyang Wu
DOI: https://doi.org/10.1121/1.4708283
2012-01-01
Abstract:This paper proposes a two-microphone method for localization of multiple speech sources by utilizing speech's sparse attribute in time-frequency domain. The proposed method estimates time-delay of each source by applying complex exponential transform to the inter-channel phase differences (IPDs). In order to improve the performance in noisy environment, the proposed method selects time-frequency points with high SNR. The method obtains the initial time-delay estimate of each speech source by utilizing the IPDs at low frequencies, and then iteratively updates the time-delay by utilizing the whole selected points. With the complex exponential transform on IPDs, the proposed method takes full advantage of the high-frequency phase information, not requiring phase compensation for IPDs at high frequencies. Experiments have been conducted to study the effect of the exponential factor on the performance of the proposed method and to compare the performance of the proposed method with the generalized hard clustering algorithm. Experimental results show that the proposed method achieves an optimal performance when the exponential factor ranges between 0.8 and 1.2. Comparisons of the results show that the proposed method outperforms the GHCA algorithm under different noise and reverberation conditions, and the performance improvement increases as the SNR is decreased.
What problem does this paper attempt to address?