SyncMalloc: A Synchronized Host-Device Co-Management System for GPU Dynamic Memory Allocation Across All Scales

Jiajian Zhang,Fangyu Wu,Hai Jiang,Guangliang Cheng,Genlang Chen,Qiufeng Wang
DOI: https://doi.org/10.1145/3673038.3673069
2024-01-01
Abstract:Dynamic memory allocation on GPUs, increasingly crucial for applications with dynamic computational patterns, encounters significant challenges due to the complex calculations with intricate branches and substantial memory resources consumed by metadata from massive thread allocations. Despite the current research, there is a lack of a scalable and flexible solution that effectively manages dynamic memory allocation while minimizing memory usage on GPUs. This paper introduces SyncMalloc, a synchronized Host-Device Co-Management system that is specifically designed to adeptly handle dynamic memory allocations of diverse magnitudes. Through the integration of pipelining and producer-consumer mechanisms, SyncMalloc effectively reduces communication overhead and resolves architectural mismatches, further enhancing its capability through synergistic integration with CUDA’s unified memory to facilitate oversubscription. Moreover, SyncMalloc advances slab-based memory management to enhance the efficiency of small allocations, reducing conflict probabilities and overhead in high-activity scenarios. Finally, we present a comprehensive performance evaluation, expanding benchmarks and measurement dimensions to reflect the performance of real-world applications more accurately. The experimental results demonstrate the effectiveness of SyncMalloc in supporting dynamic GPU allocations scaled from 4B to 200GB from multiple perspectives. Our source code is available at https://github.com/jjZhang94/SyncMalloc.
What problem does this paper attempt to address?