GPU Graph Processing on CXL-Based Microsecond-Latency External Memory

Shintaro Sano,Yosuke Bando,Kazuhiro Hiwada,Hirotsugu Kajihara,Tomoya Suzuki,Yu Nakanishi,Daisuke Taki,Akiyuki Kaneko,Tatsuo Shiozawa
DOI: https://doi.org/10.1145/3624062.3624173
2023-12-06
Abstract:In GPU graph analytics, the use of external memory such as the host DRAM and solid-state drives is a cost-effective approach to processing large graphs beyond the capacity of the GPU onboard memory. This paper studies the use of Compute Express Link (CXL) memory as alternative external memory for GPU graph processing in order to see if this emerging memory expansion technology enables graph processing that is as fast as using the host DRAM. Through analysis and evaluation using FPGA prototypes, we show that representative GPU graph traversal algorithms involving fine-grained random access can tolerate an external memory latency of up to a few microseconds introduced by the CXL interface as well as by the underlying memory devices. This insight indicates that microsecond-latency flash memory may be used as CXL memory devices to realize even more cost-effective GPU graph processing while still achieving performance close to using the host DRAM.
Performance
What problem does this paper attempt to address?