DF-BETA: An FPGA-based Memory Locality Aware Decision Forest Accelerator via Bit-Level Early Termination

Daichi Tokuda,Shinya Takamaeda-Yamazaki
DOI: https://doi.org/10.1145/3706114
IF: 2.837
2024-12-03
ACM Transactions on Reconfigurable Technology and Systems
Abstract:Decision forests, particularly Gradient Boosted Decision Trees (GBDT), are popular due to their high prediction performance and computational efficiency, making them suitable for embedded systems with circuit size and available energy constraints. In this study, we propose a new lightweight GBDT inference acceleration mechanism through the hardware and algorithm co-design. First, we present LoADPack, a hardware-friendly GBDT algorithm that enhances memory access locality. LoADPack obtains trees where the features and thresholds used across the entire ensemble are regular regardless of a branching direction by unifying some nodes and aligning the memory access patterns. Furthermore, we present DF-BETA, a resource-efficient accelerator for the LoADPack algorithm. DF-BETA utilizes MSB-first bit-serial computation to enable early determination of comparison calculations of 32-bit floating-point numbers, optimizing the operation for determining a branch direction. The hardware complexity and computation termination speed vary with the granularity of bit-serial computation. Therefore, we conduct design space exploration of DF-BETA to identify the optimal configuration. Our findings reveal that using 4-bit-serial comparators minimizes circuit size while achieving the leading throughput. Compared to running unconstrained GBDT on a typical accelerator with 32-bit bit-parallel comparators, our accelerator achieves 1.6 times higher throughput on average while maintaining comparable accuracy.
computer science, hardware & architecture
What problem does this paper attempt to address?