Abstract:Sparse matrix operations involve a large number of zero operands which makes most of the operations redundant. The amount of redundancy magnifies when a matrix operation repeatedly executes on sparse data. Optimizing matrix operations for sparsity involves either reorganization of data or reorganization of computations, performed either at compile-time or run-time. Although compile-time techniques avert from introducing run-time overhead, their application either is limited to simple sparse matrix operations generating dense output and handling immutable sparse matrices or requires manual intervention to customize the technique to different matrix operations. We contribute a compile time technique called SpComp that optimizes a sparse matrix operation by automatically customizing its computations to the positions of non-zero values of the data. Our approach neither incurs any run-time overhead nor requires any manual intervention. It is also applicable to complex matrix operations generating sparse output and handling mutable sparse matrices. We introduce a data-flow analysis, named Essential Indices Analysis, that statically collects the symbolic information about the computations and helps the code generator to reorganize the computations. The generated code includes piecewise-regular loops, free from indirect references and amenable to further optimization. We see a substantial performance gain by SpComp-generated SpMSpV code when compared against the state-of-the-art TACO compiler and piecewise-regular code generator. On average, we achieve 79% performance gain against TACO and 83% performance gain against the piecewise-regular code generator. When compared against the CHOLMOD library, SpComp generated sparse Cholesky decomposition code showcases 65% performance gain on average.

ReACT: Redundancy-Aware Code Generation for Tensor Expressions.

The tensor algebra compiler

Sgap: Towards Efficient Sparse Tensor Algebra Compilation for GPU

Compilation of Modular and General Sparse Workspaces

SpComp: A Sparsity Structure-Specific Compilation of Matrix Operations

Polyhedral Specification and Code Generation of Sparse Tensor Contraction with Co-iteration

CoNST: Code Generator for Sparse Tensor Networks

SySTeC: A Symmetric Sparse Tensor Compiler

Software for Sparse Tensor Decomposition on Emerging Computing Architectures

Automatic generation of efficient sparse tensor format conversion routines

T2S-Tensor: Productively Generating High-Performance Spatial Hardware for Dense Tensor Computations

PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR

TSCompiler: Efficient Compilation Framework for Dynamic-Shape Models

High-Performance Generalized Tensor Operations

MLIR-based code generation for GPU tensor cores

SparseAuto: An Auto-Scheduler for Sparse Tensor Computations Using Recursive Loop Nest Restructuring

Optimizing Tensor Computation Graphs with Equality Saturation and Monte Carlo Tree Search

Compressing Structured Tensor Algebra

LazyTensor: combining eager execution with domain-specific compilers

Analyzing the Performance Portability of Tensor Decomposition

PolyJuice: Detecting Mis-compilation Bugs in Tensor Compilers with Equality Saturation Based Rewriting