Entropy Maximization in Sparse Matrix by Vector Multiplication ($\max_E SpMV$)

Paolo D'Alberto,Abhishek Jain,Ismail Bustany,Henri Fraisse,Mansimran Benipal

2023-07-25

Abstract:The peak performance of any SpMV depends primarily on the available memory bandwidth and its effective use. GPUs, ASICs, and new FPGAs have higher and higher bandwidth; however, for large scale and highly sparse matrices, SpMV is still a hard problem because of its random access pattern and workload imbalance. Here, we show how to turn randomness to our advantage. We propose a matrix permutation pre-processing step that aims to maximize the entropy of the distribution of the nonzero elements. We seek any permutation that uniformly distributes the non-zero elements' distribution, thereby generating a SpMV problem that is amenable to work load balancing or to speed up sort algorithms. We conjecture these permutations would be most effective for matrices with no dense rows or columns and, as in preconditioning, when the matrix is reused. We shall show that entropy maximization is an optimization that any architecture may take advantage although in different ways. Most importantly, any developer can consider and deploy. We shall present cases where we can improve performance by 15\% on AMD-based (GPU-CPU) systems.

Distributed, Parallel, and Cluster Computing

What problem does this paper attempt to address?

### Problems Addressed by the Paper The paper primarily explores the entropy maximization problem in sparse matrix-vector multiplication (SpMV). Specifically: 1. **Background Issues**: - The performance of SpMV is mainly limited by memory bandwidth and its efficient utilization. - For large-scale and highly sparse matrices, SpMV remains a challenge due to random access patterns and workload imbalance. 2. **Proposed Method**: - A matrix permutation preprocessing step is proposed to maximize the entropy of the non-zero element distribution. - The goal is to generate an SpMV problem that is easier to balance workload or accelerate sorting algorithms. 3. **Applicable Scenarios**: - Suitable for matrices without dense rows or columns, and more effective when the matrix is reused. 4. **Expected Effects**: - Experiments show that entropy maximization can improve performance, especially on AMD-based GPU-CPU systems, with performance improvements of up to 15%. 5. **Research Scope**: - Defines the basic concepts and notations of sparse matrix multiplication. - Describes the definitions and applications of randomization and entropy maximization. - Uses entropy as a measure of uniform distribution and demonstrates the entropy changes of different matrices after randomization. - Shows through experiments the impact of randomization on SpMV performance on different architectures (such as CPU and GPU). In summary, the paper attempts to optimize the performance of sparse matrix multiplication through entropy maximization to overcome the workload imbalance issues present in existing algorithms and hardware architectures.

Entropy Maximization in Sparse Matrix by Vector Multiplication ($\max_E SpMV$)

Efficient Algorithm Design of Optimizing SpMV on GPU.

Improvement of Sparse Matrix-Vector Multiplication on GPU

SpV8 - Pursuing Optimal Vectorization and Regular Computation Pattern in SpMV.

Sparse matrix partitioning for optimizing SpMV on CPU-GPU heterogeneous platforms

A lightweight optimization selection method for Sparse Matrix-Vector Multiplication

GUST: Graph Edge-Coloring Utilization for Accelerating Sparse Matrix Vector Multiplication

PrSpMV: an Efficient Predictable Kernel for SpMV

Feature-based SpMV Performance Analysis on Contemporary Devices

Optimizing sparse matrix-vector multiplication based on gpu

Adaptive SpMV/SpMSpV on GPUs for Input Vectors of Varied Sparsity

Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Modern Multi- and Many-Core Processors

A Systematic Literature Survey of Sparse Matrix-Vector Multiplication

Efficient sparse-matrix multi-vector product on GPUs

TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs

Exploring Better Speculation and Data Locality in Sparse Matrix-Vector Multiplication on Intel Xeon

FastLoad: Speeding Up Data Loading of Both Sparse Matrix and Vector for SpMV on GPUs

Auto-SpMV: Automated Optimizing SpMV Kernels on GPU

AlphaSparse: Generating High Performance SpMV Codes Directly from Sparse Matrices

Toward Greener Matrix Operations by Lossless Compressed Formats

Optimization of SpGEMM with Risc-V vector instructions