You Will Never Walk Alone: One-Shot 3D Action Recognition with Point Cloud Sequence

Xingyu Tong,Yang Xiao,Bo Tan,Jianyu Yang,Zhiguo Cao,Joey Tianyi Zhou,Junsong Yuan
DOI: https://doi.org/10.1109/tcsvt.2024.3421304
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:In this work, we pay the first effort to address one-shot 3D action recognition in point cloud sequence, without skeleton information. The main contribution lies in two folders. First, a novel one-shot classification approach that considers the feature distribution of 3D action is proposed. We find that, for different 3D actions their dimensional-wise feature distributions are generally in Gaussian form and similar action categories hold approximate feature distributions. Accordingly, K-nearest base classes’ mean value and covariance matrix information help to form one-shot novel class’s pseudo feature distribution. To alleviate the potential ambiguous problem within nearest neighbor search, we divide the base classes into subsets via C-means clustering to facilitate the similarity measure to novel class. Meanwhile, the feature distribution of base class’s whole set and subsets will be jointly considered for generating novel class’s pseudo feature distribution. Multi-dimensional Gaussian sampling is conducted on the acquired pseudo feature distribution for feature-level data augmentation, to make one-shot novel class “never walk alone” for leveraging classifier training. Secondly to better characterize fine-grained 3D action, a temporal attention method is proposed, via introducing vision Transformer (ViT) to capture action’s discriminative short-term motion pattern with densely sampled short-term 3DV (3D dynamic voxel) features along temporal dimension. Experiments on NTU RGB+D 120 and 60 verify superiority of our approach. It outperforms state-of-the-art skeleton-based methods by 13.9% at most. The source code will be released upon acceptance.
What problem does this paper attempt to address?