FLNA: Flexibly Accelerating Feature Learning Networks for Large-Scale Point Clouds with Efficient Dataflow Decoupling

Dongxu Lyu,Zhenyu Li,Yuzhou Chen,Gang Wang,Weifeng He,Ningyi Xu,Guanghui He
DOI: https://doi.org/10.1109/tvlsi.2024.3355126
2024-01-01
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Abstract:Point cloud-based 3-D perception is poised to become a key workload on various applications. It always leverages a feature learning network (FLN) before backbones to obtain uniform representation from the scattered points. Grid-based FLN (GFLN) that partitions point clouds into uniform grids becomes the main category in recent state-of-the-art (SOTA) works. However, it heavily suffers from significant memory and computation inefficiency associated with high point sparsity and critical data dependency. To address these troubles, we propose FLNA, a GFLN accelerator with algorithm-architecture co-optimization for large-scale point clouds. At the algorithm level, the dataflow-decoupling strategy is adopted to alleviate the processing bottlenecks from pipeline dependency, which also reduces 78.3% computation cost by exploiting the redundancy from inherent sparsity and special operators. Based on the algorithm co-optimization, an effective architecture is designed with efficient GFLN mapping and block-wise processing strategies. It manages to improve on-chip memory efficiency tremendously through diverse techniques, including linked-list-based block lookup table (LUT) and transposed feature organization. Extensively evaluated on representative benchmarks, FLNA achieves 69.9–264.4 $\times$ speedup with more than 99% energy savings compared to multiple GPUs and CPU. It also demonstrates a substantial performance boost over the SOTA point cloud accelerators while providing superior support of large-scale point clouds.
What problem does this paper attempt to address?