Abstract:In object detection, precise object representation is a key factor to successfully classify and locate objects of an image. Existing methods usually use rectangular anchor boxes or a set of points to represent objects. However, these methods either introduce background noise or miss the continuous appearance information inside the object, and thus cause incorrect detection results. In this paper, we propose a novel anchor-free object detection network, called CrossDet++, which uses a set of growing crosslines along horizontal and vertical axes as object representations. An object can be flexibly represented as crosslines in different combinations, which inspires us to select the expressive crossline to effectively reduce the interference of noise. Meanwhile, the crossline representation takes into account the continuous adjacent object information, which is useful to enhance the discriminability of object features and find the object boundaries. Based on the learned crosslines, we propose an axis-query crossline growing module to adaptively capture features of crosslines and query surrounding pixels related to the line features for subsequent growing of crosslines. Their growing offsets and scales can be supervised by a decoupled regression mechanism, which limits the regression target to a specific direction for decreasing the optimization difficulty. During the training, we design a semantic-guided label assignment to emphasize the importance of crossline targets with higher semantic richness, further improving the detection performance. The experiment results demonstrate the effectiveness of our proposed method. Code can be available at: https://github.com/QiuHeqian/CrossDet.

DE-CrossDet: Divisible and Extensible Crossline Representation for Object Detection

CrossDet - Crossline Representation for Object Detection.

CrossDet++: Growing Crossline Representation for Object Detection

A Transformer-Based Object Detector with Coarse-Fine Crossing Representations

Cross-modal Deformable DETR for RGB-D Object Detection

Cross Resolution Encoding-Decoding For Detection Transformers

Feature Combination Based On Receptive Fields And Cross-Fusion Feature Pyramid For Object Detection

DetNet: A Backbone network for Object Detection

A novel fast combine-and-conquer object detector based on only one-level feature map

EfficientDet: Scalable and Efficient Object Detection

RESC: REfine the SCore with Adaptive Transformer Head for End-to-end Object Detection

Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection

DetNet: Design Backbone for Object Detection

A MultiPath Network for Object Detection

Deformable DETR: Deformable Transformers for End-to-End Object Detection

C $^{2}$ DFNet: Criss-Cross Dynamic Filter Network for RGB-D Salient Object Detection

Cross-scale information enhancement for object detection

Semantic-Aligned Matching for Enhanced DETR Convergence and Multi-Scale Feature Fusion

Disentangle Your Dense Object Detector

An Improved DETR Based on Angle Denoising and Oriented Boxes Refinement for Remote Sensing Object Detection

Exploring Context Information for Accurate and Fast Object Detection