CrossNet: Detecting Objects As Crosses.

Jiaxu Leng,Ying Liu,Zhihui Wang,Haibo Hu,Xinbo Gao
DOI: https://doi.org/10.1109/tmm.2021.3060278
IF: 7.3
2022-01-01
IEEE Transactions on Multimedia
Abstract:With the use of deep learning, object detection has achieved great breakthroughs. However, existing object detection methods still can not cope with challenging environments, such as dense objects, small objects, and object scale variations. To address these issues, this paper proposes a novel keypoint-based detection framework, called CrossNet, which significantly improves detection performance with minimal costs. In our approach, an object is modeled as a cross that consists of a center keypoint and a specific size, which eliminates the need of hand-craft anchor design. The proposed CrossNet outputs three types of maps: the center map, size map, and offset map, where both center map and offset map are to predict the center keypoints of objects and the size map is to estimate the sizes (width and height) of objects. Specifically, we first design a cascaded center prediction method that introduces a coarse-to-fine idea to improve center prediction. Furthermore, since center prediction considered as a classification task is easier than size regression relatively, we design a center-attention size regression module that uses the detection results of centers to assist the size prediction. In addition, a slightly modified hourglass network is designed to enhance the quality of feature maps for center and size prediction. Extensive experiments are conducted to demonstrate the effectiveness of CrossNet on the challenging PASCAL VOC, COCO, KITTI, and WiderFace datasets. Empirical studies show that CrossNet achieves competitive results with top-ranked one-stage and two-stage detectors while being time-efficient.
What problem does this paper attempt to address?