Abstract:3D object detection plays a pivotal role in autonomous driving. Although single-stage detectors excel in speed, they often fall short in accuracy. We have identified two main issues. First, there is a significant discrepancy in prediction accuracy across different Intersection over Union (IoU) thresholds, indicating the presence of localization errors within the model. Second, traditional point-based detection models rely heavily on 1×1 convolution operations at the Set Abstraction layer, neglecting the relationship between adjacent points. To address these issues, we present the Magnification Transformation Single-Stage Detector (MT-SSD), featuring an innovative magnification Linear Transformation Module. This module applies a linear transformation to the original point cloud, sampling radius, and object labels, magnifying the error between model predictions and true values. During inference, an inverse linear transformation is applied to the detections to achieve accurate object localization. Moreover, MT-SSD introduces the Contextual Set Abstraction (CSA) layer, incorporating 1×N convolutions within the Set Abstraction layer to achieve more thorough aggregation of features among neighboring points. Our comprehensive evaluations on various autonomous driving datasets validate MT-SSD's superior performance and efficiency. Particularly noteworthy is its achievement on the Waymo Open Dataset, where MT-SSD establishes new benchmarks in single-stage 3D object detection, setting a series of state-of-the-art records. The code is available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/qifeng22/MT-SSD</uri> .

A Simple Baseline for Multi-Camera 3D Object Detection

Leveraging Front and Side Cues for Occlusion Handling in Monocular 3D Object Detection

MT-SSD: Single-Stage 3D Object Detector Based on Magnification Transformation

SimPB: A Single Model for 2D and 3D Object Detection from Multiple Cameras

DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries

A Multi-view 3D Vehicle Detection Method Based On Novel 3D Proposal Generation Method

3M3D: Multi-view, Multi-path, Multi-representation for 3D Object Detection

Towards Unified 3D Object Detection via Algorithm and Data Unification

MonoSIM: Simulating Learning Behaviors of Heterogeneous Point Cloud Object Detectors for Monocular 3D Object Detection

Scaling Multi-Camera 3D Object Detection through Weak-to-Strong Eliciting

A Versatile Multi-View Framework for LiDAR-based 3D Object Detection with Guidance from Panoptic Segmentation

OBMO: One Bounding Box Multiple Objects for Monocular 3D Object Detection

Weakly Supervised Monocular 3D Object Detection by Spatial-Temporal View Consistency

Multi-Sensor 3D Object Box Refinement for Autonomous Driving

MV-FCOS3D++: Multi-View Camera-Only 4D Object Detection with Pretrained Monocular Backbones

Monocular 3D object detection via estimation of paired keypoints for autonomous driving

SRCN3D: Sparse R-CNN 3D Surround-View Camera Object Detection and Tracking for Autonomous Driving

SGM3D: Stereo Guided Monocular 3D Object Detection

BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View

SOGDet: Semantic-Occupancy Guided Multi-view 3D Object Detection