Abstract:Multi-View 3D object detection (MV3D) has made tremendous progress by leveraging multiple perspective features through surrounding cameras. Despite demonstrating promising prospects in various applications, accurately detecting objects through camera view in the 3D space is extremely difficult due to the ill-posed issue in monocular depth estimation. Recently, Graph-DETR3D presents a novel graph-based 3D-2D query paradigm in aggregating multi-view images for 3D object detection and achieves competitive performance. Although it enriches the query representations with 2D image features through a learnable 3D graph, it still suffers from limited depth and velocity estimation abilities due to the adoption of a single-frame input setting. To solve this problem, we introduce a unified spatial-temporal graph modeling framework to fully leverage the multi-view imagery cues under the multi-frame inputs setting. Thanks to the flexibility and sparsity of the dynamic graph architecture, we lift the original 3D graph into the 4D space with an effective attention mechanism to automatically perceive imagery information at both spatial and temporal levels. Moreover, considering the main latency bottleneck lies in the image backbone, we propose a novel dense-sparse distillation framework for multi-view 3D object detection, to reduce the computational budget while sacrificing no detection accuracy, making it more suitable for real-world deployment. To this end, we propose Graph-DETR4D, a faster and stronger multi-view 3D object detection framework, built on top of Graph-DETR3D. Extensive experiments on nuScenes and Waymo benchmarks demonstrate the effectiveness and efficiency of Graph-DETR4D. Notably, our best model achieves 62.0% NDS on nuScenes test leaderboard. Code is available at https://github.com/zehuichen123/Graph-DETR4D.

Learning Class-Based Graph Representation for Object Detection.

Inference Fusion with Associative Semantics for Unseen Object Detection

Dynamically Connected Graph Representation for Object Detection.

Learning to Infer Unseen Single-/ Multi-Attribute-Object Compositions with Graph Networks.

Relation Networks for Object Detection

A Study of Parts-Based Object Class Detection Using Complete Graphs

Relation-Aware Reasoning with Graph Convolutional Network.

Semantic-Context Graph Network for Point-based 3D Object Detection

Graph-DETR4D: Spatio-Temporal Graph Modeling for Multi-View 3D Object Detection

Detecting Objects with Context-Likelihood Graphs and Graph Refinement

A graph reasoning method for multi-object unordered stacking scenarios

Management of Academic and Educational Problems in Head Injury

Relation Matters: Foreground-aware Graph-based Relational Reasoning for Domain Adaptive Object Detection

Recurrent Adaptive Graph Reasoning Network With Region and Boundary Interaction for Salient Object Detection in Optical Remote Sensing Images

Graph Representation Learning Meets Computer Vision: A Survey

DRG: Dual Relation Graph for Human-Object Interaction Detection

AGO-Net: Association-Guided 3D Point Cloud Object Detection Network

Graph-based High-Order Relation Discovery for Fine-grained Recognition

Activation of AMPK and inactivation of Akt result in suppression of mTOR-mediated S6K1 and 4E-BP1 pathways leading to neuronal cell death in in vitro models of Parkinson's disease.