A Comprehensive Reconfigurable Computing Approach to Memory Wall Problem of Large Graph Computation

Xu Wang,Yongxin Zhu,Linan Huang
DOI: https://doi.org/10.1016/j.sysarc.2016.04.010
IF: 5.836
2016-01-01
Journal of Systems Architecture
Abstract:Graph computation problems that exhibit irregular memory access patterns are known to show poor performance on multiprocessor architectures. Although recent studies use FPGA technology to tackle the memory wall problem of graph computation by adopting a massively multi-threaded architecture, the performance is still far less than optimal memory performance due to the long memory access latency. In this paper, we propose a comprehensive reconfigurable computing approach to address the memory wall problem. First, we present an extended edge-streaming model with massive partitions to provide better load balance while taking advantage of the streaming bandwidth of external memory in processing large graphs. Second, we propose a two-level shuffle network architecture to significantly reduce the on chip memory requirement while provide high processing throughput that matches the bandwidth of the external memory. Third, we introduce a compact storage design based on graph compression schemes and propose the corresponding encoding and decoding hardware to reduce the data volume transferred between the processing engines and external memory. We validate the effectiveness of the proposed architecture by implementing three frequently-used graph algorithms on ML605 board, showing an up to 3.85 x improvement in terms of performance to bandwidth ratio over previously published FPGA-based implementations. (C) 2016 Elsevier B.V. All rights reserved.
What problem does this paper attempt to address?