Self-Supervised Learning for Alignment of Objects and Sound

Xinzhu Liu,Xiaoyu Liu,Di Guo,Huaping Liu,Fuchun Sun,Haibo Min
DOI: https://doi.org/10.1109/icra40945.2020.9197566
2020-01-01
Abstract:The sound source separation problem has many useful applications in the field of robotics, such as human-robot interaction, scene understanding, etc. However, it remains a very challenging problem. In this paper, we utilize both visual and audio information of videos to perform the sound source separation task. A self-supervised learning framework is proposed to implement the object detection and sound separation modules simultaneously. Such an approach is designed to better find the alignment between the detected objects and separated sound components. Our experiments, conducted on both the synthetic and real datasets, validate this approach and demonstrate the effectiveness of the proposed model in the task of object and sound alignment.
What problem does this paper attempt to address?