3D Audio-Visual Speaker Tracking with A Two-Layer Particle Filter.

Hong Liu,Yidi Li,Bing Yang
DOI: https://doi.org/10.1109/icpr48806.2021.9412682
2019-01-01
Abstract:Audio-visual speaker tracking in 3D space is a challenging problem. Although the classical particle filter based methods have shown effectiveness in audio-visual speaker tracking, the performance degrades considerably when the measurements are disturbed by noise. To this end, a novel two-layer particle filter is proposed for 3D audio-visual speaker tracking. Firstly, two groups of particles, which are generated from the audio and video streams respectively, are propagated independently in the audio layer and visual layer. Then, the audio and visual likelihoods are combined in an adaptive sigmoid function, which can adjust particle weights according to the confidence of two modalities. Finally, an optimal particle set selected from two groups of particles is proposed to determine the speaker position and reset the particle positions in the next frame. Experiments on AV16.3 database show that our method outperforms the trackers using individual modalities and the existing approaches in the 3D space and on the image plane.
What problem does this paper attempt to address?