GUST: Graph Edge-Coloring Utilization for Accelerating Sparse Matrix Vector Multiplication

Armin Gerami,Bahar Asgari
2024-10-10
Abstract:Sparse matrix-vector multiplication (SpMV) plays a vital role in various scientific and engineering fields, from scientific computing to machine learning. Traditional general-purpose processors often fall short of their peak performance with sparse data, leading to the development of domain-specific architectures to enhance SpMV. Yet, these specialized approaches, whether tailored explicitly for SpMV or adapted from matrix-matrix multiplication accelerators, still face challenges in fully utilizing hardware resources as a result of sparsity. To tackle this problem, we introduce GUST, a hardware/software co-design, the key insight of which lies in separating multipliers and adders in the hardware, thereby enabling resource sharing across multiple rows and columns, leading to efficient hardware utilization and ameliorating negative performance impacts from sparsity. Resource sharing, however, can lead to collisions, a problem we address through a specially devised edge-coloring scheduling algorithm. Our comparisons with various prior domain specific architectures using real-world datasets shows the effectiveness of GUST, with an average hardware utilization of $33.67\%$.
Hardware Architecture
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to improve the utilization rate of sparse matrix - vector multiplication (SpMV) on hardware resources, so as to reduce the execution time and energy consumption. ### Problem Background Sparse matrix - vector multiplication (SpMV) plays an important role in scientific computing and engineering fields, and has a wide range of applications from scientific computing to machine learning. However, traditional general - purpose processors often cannot fully exert their peak performance when processing sparse data, resulting in low utilization of hardware resources. Although some special - purpose architectures have been developed to enhance the performance of SpMV, these special - purpose methods still face challenges, especially in terms of making full use of hardware resources. ### Solutions Proposed in the Paper To solve these problems, the paper introduces GUST (Graph Edge - Coloring Utilization for Accelerating Sparse Matrix Vector Multiplication), which is a hardware/software co - designed accelerator aiming to maximize hardware utilization. Specifically, GUST achieves this goal in the following ways: 1. **Separate Multipliers and Adders**: GUST separates the multipliers and adders and connects them through a crossbar connector. This design enables multiple rows and columns to share resources, thereby improving hardware utilization and reducing performance degradation due to sparsity. 2. **Prevent Collisions**: Resource sharing may lead to conflicts, that is, multiple operations compete for the same resource simultaneously. To solve this problem, GUST introduces a scheduling algorithm based on bipartite graph edge - coloring. This algorithm ensures that no multiple elements from the same row or the same column enter the multiplier or the adder within the same time step, thus avoiding conflicts. 3. **Load Balancing**: To further optimize performance, GUST also adopts a sorting - based load - balancing strategy to ensure that the input stream is as dense as possible and reduce idle cycles. ### Experimental Results The paper verifies the effectiveness of GUST by comparing it with existing special - purpose architectures. The experimental results show that: - On real - world sparse matrices, the average hardware utilization rate of GUST with a length of 256 is 33.67%. - Compared with a one - dimensional (1D) systolic array with a length of 256, GUST achieves a 411 - fold speed improvement and a 137 - fold energy - efficiency improvement. - Among nine actual matrices, GUST has a shorter execution time on seven matrices and lower energy consumption on four matrices. ### Summary Through hardware/software co - design, especially by separating multipliers and adders and introducing a bipartite graph edge - coloring scheduling algorithm, GUST successfully solves the problem of low hardware resource utilization in sparse matrix - vector multiplication and significantly improves performance and energy efficiency.