GATe: Streamlining Memory Access and Communication to Accelerate Graph Attention Network With Near-Memory Processing

Shiyan Yi,Yudi Qiu,Lingfei Lu,Guohao Xu,Yong Gong,Xiaoyang Zeng,Yibo Fan
DOI: https://doi.org/10.1109/lca.2024.3386734
IF: 2.3
2024-04-24
IEEE Computer Architecture Letters
Abstract:Graph Attention Network (GAT) has gained widespread adoption thanks to its exceptional performance. The critical components of a GAT model involve aggregation and attention, which cause numerous main-memory access. Recently, much research has proposed near-memory processing (NMP) architectures to accelerate aggregation. However, graph attention requires additional operations distinct from aggregation, making previous NMP architectures less suitable for supporting GAT. In this paper, we propose GATe, a practical and efficient GAT accelerator with NMP architecture. To the best of our knowledge, this is the first time that accelerates both attention and aggregation computation on DIMM. In the attention and aggregation phases, we unify feature vector access to reduce repetitive memory accesses and refine the computation flow to reduce communication. Furthermore, we introduce a novel sharding method that enhances the data reusability. Experiments show that our work achieves substantial speedup of up to 6.77× and 2.46×, respectively, compared to state-of-the-art NMP works GNNear and GraNDe.
computer science, hardware & architecture
What problem does this paper attempt to address?