BitNN: A Bit-Serial Accelerator for K-Nearest Neighbor Search in Point Clouds

Meng Han,Liang Wang,Limin Xiao,Hao Zhang,Tianhao Cai,Jiale Xu,Yibo Wu,Chenhao Zhang,Xiangrong Xu
DOI: https://doi.org/10.1109/isca59077.2024.00095
2024-01-01
Abstract:Point cloud-based machine perception applications have achieved great success in various scenarios. In this work, we focus on point cloud k-Nearest Neighbor (kNN) search, an important kernel for point clouds. Existing kNN acceleration techniques have overlooked the operation-level optimization in the Euclidean distance computation operations, which suffer from low efficiency due to a number of unnecessary computations and various data precision requirements. We reconsider point cloud kNN search from a new bitserial computation perspective and propose BitNN, a bit-serial architecture for point cloud kNN search. BitNN supports adaptive precision processing and unnecessary computing reduction, significantly improving the performance and power efficiency of kNN search. To achieve that, we first propose a bit-serial computation method for kNN search, which derives a recursive expression to compute the Euclidean distance bit by bit. Then, the dimension-wise point cloud encoding method and point-wise data layout method are proposed to enable adaptive precision processing based on bit-serial computation. Furthermore, we present an early termination mechanism for bit-serial kNN search. By estimating the lower bound of distance based on a few bits, a number of unnecessary computations can be reduced. Finally, we design an efficient bit-serial accelerator for kNN search. The accelerator exploits the massive parallelism to improve computing efficiency. We evaluate BitNN with several widely used point cloud datasets. BitNN achieves up to 6.6x speedup and 3.6x power efficiency compared to a comparable sized architecture. Moreover, BitNN can be easily integrated into existing bit-parallel kNN accelerators. We enhance the state-of-the-art kNN accelerator, ParallelNN, with bit-serial computation techniques, achieving up to 4.4x speedup and 2.9x power efficiency.
What problem does this paper attempt to address?