SCFANet: Semantics and Context Feature Aggregation Network for 360° Salient Object Detection

Zhentao He,Feng Shao,Gang Chen,Xiongli Chai,Yo-Sung Ho
DOI: https://doi.org/10.1109/tmm.2023.3293994
IF: 7.3
2023-01-01
IEEE Transactions on Multimedia
Abstract:How to solve the problem of geometric distortion is the key for salient object detection (SOD) in 360° omnidirectional images. Most of the current methods integrate global and local visual cues through the fusion of the 360° equirectangular images and corresponding 360° cube-map images. The fusion in a single level cannot effectively utilize the information between the 360° equirectangular images and corresponding 360° cube-map images. In this work, we innovatively propose a semantics and context feature aggregation network (SCFANet) by fully exploring the interactivity between the two projection data. Specifically, we use Vision Transformer (ViT) to capture global visual cues for 360° equirectangular images and Convolutional Neural Network (CNN) to capture local visual cues for 360° cube-map images. To achieve effective fusion of the two projection data, we design a semantic guidance module (SGM), in which semantic features are used to guide the information fusion of the 360° equirectangular images and corresponding 360° cube-map images at each level. Then, a context fusion module (CFM) containing one local input and two context inputs is designed to integrate multi-scale features, where the local input extracts its own multi-scale information, and the context inputs complements their fine details and location information. Finally, we use feature aggregation and refinement module (FARM) to aggregate semantics and context feature and adopt a deep supervision strategy for training. Extensive experiments on two public 360° datasets show that our SCFANet exhibits competitive performance compared to other state-of-the-art (SOTA) 360° salient object detection models.
computer science, information systems,telecommunications, software engineering
What problem does this paper attempt to address?