Anchor-Based Spatio-Temporal Attention 3-D Convolutional Networks for Dynamic 3-D Point Cloud Sequences

Guangming Wang,Hanwen Liu,Muyao Chen,Yehui Yang,Zhe Liu,Hesheng Wang
DOI: https://doi.org/10.1109/tim.2021.3106101
IF: 5.6
2021-01-01
IEEE Transactions on Instrumentation and Measurement
Abstract:With the rapid development of measurement technology, light detection and ranging (LiDAR) and depth cameras are widely used in the perception of the 3-D environment. Recent learning-based methods for robot perception most focus on the image or video, but deep learning methods for dynamic 3-D point cloud sequences are underexplored. Therefore, developing an efficient and accurate perception method compatible with these advanced instruments is pivotal to autonomous driving and service robots. An anchor-based spatio-temporal attention 3-D convolution (ASTA3DConv) operation is proposed in this article to process dynamic 3-D point cloud sequences. The proposed convolution operation builds a regular receptive field around each point by setting several virtual anchors around each point. The features of neighborhood points are first aggregated to each anchor based on the spatio-temporal attention mechanism. Then, anchor-based 3-D convolution is adopted to aggregate these anchors' features to the core points. The proposed method makes better use of the structured information within the local region and learns spatio-temporal embedding features from dynamic 3-D point cloud sequences. Anchor-based spatio-temporal attention 3-D convolutional neural networks (ASTA3DCNNs) are built for classification and segmentation tasks based on the proposed ASTA3DConv and evaluated on action recognition and semantic segmentation tasks. The experiments and ablation studies on MSRAction3D and Synthia datasets demonstrate the superior performance and effectiveness of our method for dynamic 3-D point cloud sequences. Our method achieves the state-of-the-art performance among the methods with dynamic 3-D point cloud sequences as input on MSRAction3D and Synthia datasets.
engineering, electrical & electronic,instruments & instrumentation
What problem does this paper attempt to address?