Audio-Visual Variational Fusion for Multi-Person Tracking with Robots

Xavier Alameda-Pineda,Soraya Arias,Yutong Ban,Guillaume Delorme,Laurent Girin,Radu Horaud,Xiaofei Li,Bastien Mourgue,Guillaume Sarrazin
DOI: https://doi.org/10.1145/3343031.3350590
2019-01-01
Abstract:Robust multi-person tracking with robots opens the door to analysing engagement and social signals in real-world environments. Multi-person scenarios are charaterised by (i) a time-varying number of people, (ii) intermittent auditory (e.g.speech turns) and visual cues (e.g.person appearing/disappearing) and (iii) impact of the robot actions in perception. The various sensors (cameras and microphones) available for perception, provide a rich flow of information of intermittent and complementary nature. How to jointly exploit these cues to tackle the multi-person tracking problem with an autonomous system has been an intense research line of the Perception Team in the past few years. In this demo we want to present our, now mature, achievements in the field, and demonstrate two robotic systems able to track multiple persons using auditory and visual cues, when they are available. We will bring the two robots and the necessary computing resources with us, as well as the required presentation materials to discuss the models, methods and tools supporting this technology with the attendants.
What problem does this paper attempt to address?