Multimodal Fusion for Indoor Sound Source Localization.

Jinhui Chen,Ryoichi Takashima,Xingchen Guo,Zhihong Zhang,Xuexin Xu,Tetsuya Takiguchi,Edwin R. Hancock
DOI: https://doi.org/10.1016/j.patcog.2021.107906
IF: 8
2021-01-01
Pattern Recognition
Abstract:•We propose a novel solution based on fusing visual and acoustic models to accurately identify the localization information of sound localization.•We develop a HMM-based method for separation of the acoustic transfer function (ATF) to describe clean speech sound.•We propose a new Fourier domain method for fast implementation of the HOG-type polar feature descriptor.•The proposed method has rotation-invariant capabilities and also preserves the discriminative power of extracted features.
What problem does this paper attempt to address?