SP$^2$T: Sparse Proxy Attention for Dual-stream Point Transformer

Jiaxu Wan,Hong Zhang,Ziqi He,Qishu Wang,Ding Yuan,Yifan Yang
2024-12-16
Abstract:In 3D understanding, point transformers have yielded significant advances in broadening the receptive field. However, further enhancement of the receptive field is hindered by the constraints of grouping attention. The proxy-based model, as a hot topic in image and language feature extraction, uses global or local proxies to expand the model's receptive field. But global proxy-based methods fail to precisely determine proxy positions and are not suited for tasks like segmentation and detection in the point cloud, and exist local proxy-based methods for image face difficulties in global-local balance, proxy sampling in various point clouds, and parallel cross-attention computation for sparse association. In this paper, we present SP$^2$T, a local proxy-based dual stream point transformer, which promotes global receptive field while maintaining a balance between local and global information. To tackle robust 3D proxy sampling, we propose a spatial-wise proxy sampling with vertex-based point proxy associations, ensuring robust point-cloud sampling in many scales of point cloud. To resolve economical association computation, we introduce sparse proxy attention combined with table-based relative bias, which enables low-cost and precise interactions between proxy and point features. Comprehensive experiments across multiple datasets reveal that our model achieves SOTA performance in downstream tasks. The code has been released in <a class="link-external link-https" href="https://github.com/TerenceWallel/Sparse-Proxy-Point-Transformer" rel="external noopener nofollow">this https URL</a> .
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?