Abstract:Point clouds are an important form of 3D data used in applications such as computer vision and autonomous driving, but the irregular and disordered nature of point clouds makes processing them severely challenging. Recently, point-based neural networks for point clouds have been widely used in various 3D applications. Notably, transformer-based models have demonstrated state-of-the-art accuracy. However, three significant challenges exist: (1) Data interdependence hinders parallel execution in networks like Point Transformer. (2) The Farthest Point Sampling (FPS) involves redundant memory access and computational overhead. (3) Intermediate results require repetitive memory access and calculations between FPS and kNN operators. This limits Point Transformer’s processing speed to 17.80 frames per second on NVIDIA Jetson Orin, below the real-time requirement of around 30 frames per second. In this paper, we introduce, an innovative Point Transformer Accelerator to address the aforementioned three challenges from the following three levels. On the computation graph level, our investigation reveals that the Point Transformer’s performance suffers minimal degradation when operating within a constrained receptive field. Leveraging this insight, strategically frees the MaxPool and attention-kNN layers, along with their associated data dependencies, achieving an inconsequential loss in accuracy. On the operator level, we identify that the variability for distance computation among accessed points during Farthest Point Sampling (FPS) iterations contributes to redundant memory accesses and computational overhead. proposes a distribution-aware heuristic for distance calculation to minimize unnecessary memory accesses and computational redundancies within the FPS operator. On the architecture level, we recognize that the transition down process (encompassing FPS and kNN operations) constitutes 71.77% of the total inference time, proposes an integrated FPS-kNN architecture to select error-driven k neighbors, reducing repeated memory accesses and distance recalculations of intermediate results. Through extensive experimentation, demonstrates remarkable performance improvements, achieving end-to-end speedups of up to 2.96×, 1.70×, and 1.19× when compared to the state-of-the-art accelerators PointAcc pointacc, MARS mars, and PTrAcc ptracc, respectively, across a variety of point cloud neural networks.

An Efficient Accelerator for Point-based and Voxel-based Point Cloud Neural Networks

Voxel-CIM: An Efficient Compute-in-Memory Accelerator for Voxel-based Point Cloud Neural Networks

Accelerating DNN-based 3D point cloud processing for mobile computing

A Point Transformer Accelerator with Distribution-Aware Heuristic Distance Calculation

A 28-Nm Energy-Efficient Sparse Neural Network Processor for Point Cloud Applications Using Block-Wise Online Neighbor Searching

MARS: Exploiting Multi-Level Parallelism for DNN Workloads on Adaptive Multi-Accelerator Systems

Pointer: An Energy-Efficient ReRAM-based Point Cloud Recognition Accelerator with Inter-layer and Intra-layer Optimizations

An Efficient FPGA Accelerator for Point Cloud

FusionArch: A Fusion-Based Accelerator for Point-Based Point Cloud Neural Networks

A Point Transformer Accelerator with Fine-Grained Pipelines and Distribution-Aware Dynamic FPS

ParallelNN: A Parallel Octree-based Nearest Neighbor Search Accelerator for 3D Point Clouds

Automatic Mapping of Heterogeneous DNN Models on Adaptive Multi-Accelerator Systems

A 28nm 2D/3D Unified Sparse Convolution Accelerator with Block-Wise Neighbor Searcher for Large-Scaled Voxel-Based Point Cloud Network.

An Energy-Efficient Near-Data Processing Accelerator for DNNs that Optimizes Data Accesses

A Demonstration Platform for Large-Scaled Point Cloud Network Based on 28nm 2D/3D Unified Sparse Convolution Accelerator.

A 3d Multi-Layer Cmos-Rram Accelerator for Neural Network

TiPU: A Spatial-Locality-Aware Near-Memory Tile Processing Unit for 3D Point Cloud Neural Network.

A Small-Footprint Accelerator for Large-Scale Neural Networks

AccSS3D: Accelerator for Spatially Sparse 3D DNNs

Mesorasi: Architecture Support for Point Cloud Analytics via Delayed-Aggregation

MEGA: A Memory-Efficient GNN Accelerator Exploiting Degree-Aware Mixed-Precision Quantization