Active Object Discovery and Localization Using Sound-Induced Attention

Huaping Liu,Feng Wang,Di Guo,Xinzhu Liu,Xinyu Zhang,Fuchun Sun
DOI: https://doi.org/10.1109/tii.2020.3000240
IF: 12.3
2021-01-01
IEEE Transactions on Industrial Informatics
Abstract:Industrial intelligent devices are usually equipped with both microphones and cameras to perceive and understand the physical world. Though visual object detection technology has achieved a great success, its combination with other sensing modalities remains unsolved. In this article, we establish a novel sound-induced attention framework for the visual object detection, and develop a two-stream weakly supervised deep learning architecture to combine the visual and audio modalities for localizing the sounding object. A dataset is constructed from the Audio Set to validate the proposed method and some realistic experiments are conducted to demonstrate the effectiveness of the proposed system.
What problem does this paper attempt to address?