A Scalable, Efficient, and Robust Dynamic Memory Management Library for HLS-based FPGAs

Qinggang Wang,Long Zheng,Zhaozeng An,Shuyi Xiong,Runze Wang,Yu Huang,Pengcheng Yao,Xiaofei Liao,Hai Jin,Jingling Xue
DOI: https://doi.org/10.1109/micro61859.2024.00040
2024-01-01
Abstract:Nowadays, high-level synthesis (HLS) has gained prominence for FPGA-based architecture prototyping, enhancing productivity significantly. Despite this advancement, HLS tools are impeded by a critical drawback: they lack support for dynamic memory management (DMM), leading to static mem-ory allocation and suboptimal use of memory resources. In response, numerous efforts have been made to develop DMM solutions compatible with HLS. However, our analysis indicates that existing solutions fail to concurrently meet the desired trifecta of scalability (efficient management of memory of any size), efficiency (minimal latency in memory (de-)allocation), and robustness (low allocation failure rates). This limitation hampers their applicability in real-world scenarios. In this paper, we introduce GraDMM, a “three-birds-one- stone” solution that comprehensively enhances the scalability, efficiency, and robustness of DMM. The key insight is to formulate memory (de-)allocation as graph analytics and lever-age sophisticated FPGA-based graph processing techniques. To achieve scalability, GraDMM specializes a simplified pipeline that significantly suppresses resource utilization expansion caused by managed memory scaling. This is crucial for managing arbitrarily sized memory on resource-limited FPGA platforms. For efficiency, GraDMM implements a data-centric concurrent traversal scheme and a shortcut-assisted fast traversal policy to accelerate (de-)allocation-guided graph traversal, reducing mem-ory (de-)allocation latency. To enhance robustness, GraDMM incorporates an adaptive memory defragmenter that defragments managed memory to minimize fragmentation-induced allocation failures. GraDMM is encapsulated as a library, providing high- level interfaces for users and ensuring synthesizability with Vi- vado HLS. Experimental results demonstrate that GraDMM out-performs three state-of-the-art HLS-compatible DMM solutions by significant margins: 56.71 %-85.59% in resource consumption savings, 78.94 % -99.99 % in (de-)allocation latency improvement, and 10.71 %-65.75% in allocation failure reduction.
What problem does this paper attempt to address?