L-FNNG: Accelerating Large-Scale KNN Graph Construction on CPU-FPGA Heterogeneous Platform

Chaoqiang Liu,Xiaofei Liao,Long Zheng,Yu Huang,Haifeng Liu,Yi Zhang,Haiheng He,Haoyan Huang,Jingyi Zhou,Hai Jin
DOI: https://doi.org/10.1145/3652609
IF: 2.837
2024-03-15
ACM Transactions on Reconfigurable Technology and Systems
Abstract:Due to the high complexity of constructing exact k -nearest neighbor graphs, approximate construction has become a popular research topic. The NN-Descent algorithm is one of the representative in-memory algorithms. To effectively handle large datasets, existing state-of-the-art solutions combine the divide-and-conquer approach and the NN-Descent algorithm, where large datasets are divided into multiple partitions, and a subgraph is constructed for each partition before all the subgraphs are merged, reducing the memory pressure significantly. However, such solutions fail to address inefficiencies in large-scale k -nearest neighbor graph construction. In this paper, we propose L-FNNG, a novel solution for accelerating large-scale k -nearest neighbor graph construction on CPU-FPGA heterogeneous platform. The CPU is responsible for dividing data and determining the order of partition processing, while the FPGA executes all construction tasks to utilize the acceleration capability fully. To accelerate the execution of construction tasks, we design an efficient FPGA accelerator, which includes the Block-based Scheduling (BS) and Useless Computation Aborting (UCA) techniques to address the problems of memory access and computation in the NN-Descent algorithm. We also propose an efficient scheduling strategy that includes a KD-tree-based data partitioning method and a hierarchical processing method to address scheduling inefficiency. We evaluate L-FNNG on a Xilinx Alveo U280 board hosted by a 64-core Xeon server. On multiple large-scale datasets, L-FNNG achieves, on average, 2.3 × construction speedup over the state-of-the-art GPU-based solution.
computer science, hardware & architecture
What problem does this paper attempt to address?
The paper aims to address the efficiency issues in large-scale k-nearest neighbor (KNN) graph construction. Specifically, the paper proposes a new method called L-FNNG, which accelerates large-scale KNN graph construction on a CPU-FPGA heterogeneous platform through the following points: 1. **Algorithm Bottleneck Analysis**: - The paper first analyzes the performance bottlenecks of the existing NN-Descent algorithm on the CPU, finding that the computation phase occupies most of the execution time, indicating that high-dimensional vector operations become the main bottleneck. Additionally, due to irregular memory access patterns and a large amount of useless computation, the algorithm's execution efficiency is low. 2. **Problems with Existing Solutions**: - The current methods combining the "divide and conquer" strategy with the NN-Descent algorithm, although addressing the memory challenges brought by large-scale datasets, still have efficiency issues during the subgraph merging process. For example, randomly partitioning the dataset can lead to actual neighbor nodes being distributed in different partitions, resulting in most connections within each subgraph being between distant nodes. This not only increases the computational burden but also causes these edges to be replaced during the final merge. 3. **Core Contributions of L-FNNG**: - Proposes an efficient FPGA accelerator design, including Block-based Scheduling (BS) and Useless Computation Aborting (UCA) techniques, to optimize memory access and reduce unnecessary computation. - Develops a new scheduling strategy that uses a KD-tree to partition the dataset and employs a hierarchical processing method to ensure that nodes can establish connections with their nearest neighbors as early as possible, thereby accelerating the convergence process. - In experimental evaluations, L-FNNG achieved an average speedup of 2.3 times compared to the state-of-the-art GPU solutions on a 64-core Xeon server equipped with a Xilinx Alveo U280 card. In summary, L-FNNG effectively addresses the efficiency issues in large-scale KNN graph construction by optimizing the algorithm execution process, improving data scheduling strategies, and fully leveraging the parallel computing capabilities of FPGA.