TSSTDet: Transformation-Based 3-D Object Detection via a Spatial Shape Transformer

Hiep Anh Hoang,Duy Cuong Bui,Myungsik Yoo
DOI: https://doi.org/10.1109/jsen.2024.3350770
IF: 4.3
2024-03-02
IEEE Sensors Journal
Abstract:Accurately detecting and understanding the shapes of objects in 3-D scenes are essential for autonomous driving. In a 3-D scene, objects are distributed with various incomplete shapes and rotations. Determining the shape allows for a comprehensive understanding of an object's dimensions, rotations, and spatial relationships with its surroundings. Traditional detection methods do not explicitly consider the rotations and complete shapes that objects can assume. Consequently, these methods require large networks and extensive data augmentation to detect accurately. Taking advantage of the vision-transformer (ViT), we introduce an efficient transformer-based 3-D detector called transformation-based 3-D object detection via a spatial shape transformer (TSSTDet) to address these challenges. We constructed TSSTDet as a multistage detector based on a light detection and ranging (LiDAR) point cloud. Specifically, TSSTDet utilizes a sparse convolution (SpConv) backbone to extract multichannel and transformation-equivariant voxel features. Furthermore, we designed an efficient module that employs the transformer approach to estimate the completed shape of an object. These features are then aligned and aggregated to create lightweight and compact representations that enable high-performance 3-D object detection. We assessed the effectiveness of the proposed framework by evaluating its performance on both the KITTI and Waymo open datasets (WODs). These evaluations demonstrated that our framework achieves top-tier performance in 3-D object detection.
engineering, electrical & electronic,instruments & instrumentation,physics, applied
What problem does this paper attempt to address?