Abstract:There have been two streams in the 3D detection from point clouds: single-stage methods and two-stage methods. While the former is more computationally efficient, the latter usually provides better detection accuracy. By carefully examining the two-stage approaches, we have found that if appropriately designed, the first stage can produce accurate box regression. In this scenario, the second stage mainly rescores the boxes such that the boxes with better localization get selected. From this observation, we have devised a single-stage anchor-free network that can fulfill these requirements. This network, named AFDetV2, extends the previous work by incorporating a self-calibrated convolution block in the backbone, a keypoint auxiliary supervision, and an IoU prediction branch in the multi-task head. We take a simple product of the predicted IoU score with the classification heatmap to form the final classification confidence. The enhanced backbone strengthens the box localization capability, and the rescoring approach effectively joins the object presence confidence and the box regression accuracy. As a result, the detection accuracy is drastically boosted in the single-stage. To evaluate our approach, we have conducted extensive experiments on the Waymo Open Dataset and the nuScenes Dataset. We have observed that our AFDetV2 achieves the state-of-the-art results on these two datasets, superior to all the prior arts, including both the single-stage and the two-stage 3D detectors. AFDetV2 won the 1st place in the Real-Time 3D Detection of the Waymo Open Dataset Challenge 2021. In addition, a variant of our model AFDetV2-Base was entitled the "Most Efficient Model" by the Challenge Sponsor, showing a superior computational efficiency. To demonstrate the generality of this single-stage method, we have also applied it to the first stage of the two-stage networks. Without exception, the results show that with the strengthened backbone and the rescoring approach, the second stage refinement is no longer needed.

VFEDet: a variational information bottleneck based feature enhancement object detection network

Multilevel Spatial-Temporal Feature Aggregation for Video Object Detection

BFE-Net: Object Detection with Bidirectional Feature Enhancement

A Vision Enhancement and Feature Fusion Multiscale Detection Network

Feature Enhancement Network for Object Detection in Optical Remote Sensing Images

Deep Convolutional Feature Enhancement for Remote Sensing Object Detection

Infrared and Visible Image Object Detection via Focused Feature Enhancement and Cascaded Semantic Extension

FEFN: Feature Enhancement Feedforward Network for Lightweight Object Detection in Remote Sensing Images

Feature-enhanced composite backbone network for object detection

Research on Feature Enhancement for Small Object Detection

DVFENet: Dual-branch voxel feature extraction network for 3D object detection

DetNet: A Backbone network for Object Detection

Comprehensive Feature Enhancement Module For Single-Shot Object Detector

FD2-Net: Frequency-Driven Feature Decomposition Network for Infrared-Visible Object Detection

When CNN meet with ViT: decision-level feature fusion for camouflaged object detection

MDFN: Multi-scale deep feature learning network for object detection

Small Object Detection Network Based on Feature Information Enhancement

EFR-FCOS: enhancing feature reuse for anchor-free object detector

FFEDet: Fine-Grained Feature Enhancement for Small Object Detection

AFDetV2: Rethinking the Necessity of the Second Stage for Object Detection from Point Clouds

Complementary Feature Pyramid Network for Object Detection