Bi-projection for 360°image Object Detection Bridged by RoI Searcher

Chunyu Lin,Zishuo Zheng,Lang Nie,Kang Liao,Yao Zhao
DOI: https://doi.org/10.1016/j.jvcir.2022.103660
IF: 2.887
2022-01-01
Journal of Visual Communication and Image Representation
Abstract:Object detection on 360°images is a vital component of 3D environment perception. The existing methods either treat panoramic images (usually represented in equirectangular projection—ERP) as normal FoV images and endure the distortions or project them into the less-distortion format and narrow the FoV, leading to unsatisfactory performance in practical applications. To solve this problem, we propose a dual-projection 360°object detection network named Bip R-CNN , consisting of three modules: a bi-projection feature extractor, a cross-projection region-of-interest (RoI) searcher, and a classification and regression predictor. Specifically, we extract the equirectangular and corresponding dual-cubemap features simultaneously from the input images. Besides, Projection-Inter Feature Fusion and Projection-Intra Feature Fusion are designed to allow the mutual interaction between the bi-projective features and promote the integration of features at different scales, respectively. In the proposed cross-projection RoI Searcher, we search for the bounding box (BBox) locations on cubemap from the corresponding ERP spherical proposals, bridging the RoIs of two different projection formats at feature level. Finally, the cube proposals are used to detect objects in the last predictor module. Considering the scarceness of the existing panoramic dataset (only indoor scenes), we propose an efficient approach to convert conventional datasets into annotated panoramic datasets without manual intervention, increasing the diversity of panoramic datasets. Extensive experiments are conducted on the synthetic and real-world datasets with spherical criteria, demonstrating our superiority to other state-of-the-art solutions.
What problem does this paper attempt to address?