Abstract:The tensor programming abstraction has become a foundational paradigm for modern computing. This framework allows users to write high performance programs for bulk computation via a high-level imperative interface. Recent work has extended this paradigm to sparse tensors (i.e. tensors where most entries are not explicitly represented) with the use of sparse tensor compilers. These systems excel at producing efficient code for computation over sparse tensors, which may be stored in a wide variety of formats. However, they require the user to manually choose the order of operations and the data formats at every step. Unfortunately, these decisions are both highly impactful and complicated, requiring significant effort to manually optimize. In this work, we present Galley, a system for declarative sparse tensor programming. Galley performs cost-based optimization to lower these programs to a logical plan then to a physical plan. It then leverages sparse tensor compilers to execute the physical plan efficiently. We show that Galley achieves high performance on a wide variety of problems including machine learning algorithms, subgraph counting, and iterative graph algorithms.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to simplify and optimize the writing and execution of sparse tensor programs, so that users do not need to manually select the operation sequence and data format, thereby reducing the optimization workload and improving performance. Specifically, although existing Sparse Tensor Compilers (STCs) can efficiently handle sparse tensor calculations, they require users to manually determine the order of operations, the data format of intermediate results, the loop order, and the iterative algorithm. These decisions have a significant impact on performance, but are very complex and require a great deal of manual optimization work. Therefore, users face enormous challenges when writing sparse tensor programs. To solve this problem, the authors propose the Galley system, a framework for declarative sparse tensor programming. The main goals of Galley include: 1. **Automatic Optimization**: Galley converts the user's high - level sparse tensor program into an efficient physical execution plan through cost - based optimization methods, without the need for users to perform complex optimizations manually. 2. **Logical Optimization**: Galley rewrites the input program as a series of aggregation steps, minimizing the total calculation and materialization costs. 3. **Physical Optimization**: Galley selects the optimal loop order, output format, and merging algorithm to generate an efficient STC kernel. 4. **Sparsity Estimation**: Galley introduces a statistical framework to estimate the sparsity of intermediate results and guide the optimization process. Through these methods, Galley can achieve significant performance improvements on a variety of tasks, such as machine learning algorithms, sub - graph counting, and iterative graph algorithms. Experimental results show that Galley is 100 times faster than hand - optimized kernels on mixed dense - sparse workloads and 100 times faster than the state - of - the - art databases on highly sparse workloads. ### Formula Representation Some formulas involved in the article are represented in Markdown format as follows: - The expression of matrix chain multiplication: \[ E_{im}=\sum_{jkl}A_{ij}B_{jk}C_{kl}D_{lm} \] - The cost model in physical optimization: \[ cost\approx a\cdot nnz(Agg)+b\cdot nnz(MapExpr) \] where \(nnz\) represents the number of non - zero elements. In this way, Galley not only simplifies the writing of sparse tensor programs but also significantly improves their execution efficiency.

Galley: Modern Query Optimization for Sparse Tensor Programs

Compilation of Modular and General Sparse Workspaces

Optimizing Tensor Programs on Flexible Storage

Automatic generation of efficient sparse tensor format conversion routines

Software for Sparse Tensor Decomposition on Emerging Computing Architectures

The tensor algebra compiler

Sgap: Towards Efficient Sparse Tensor Algebra Compilation for GPU

Tailors: Accelerating Sparse Tensor Algebra by Overbooking Buffer Capacity

A Parallel Sparse Tensor Benchmark Suite on CPUs and GPUs

Performance Optimization for Sparse A(T)Ax in Parallel on Multicore Cpu

SparseAuto: An Auto-Scheduler for Sparse Tensor Computations Using Recursive Loop Nest Restructuring

SpDISTAL: Compiling Distributed Sparse Tensor Computations

Polyhedral Specification and Code Generation of Sparse Tensor Contraction with Co-iteration

Optimizing the Linear Fascicle Evaluation Algorithm for Multi-Core and Many-Core Systems

High-Performance Generalized Tensor Operations

Parallel Sparse Tensor Decomposition in Chapel

T2S-Tensor: Productively Generating High-Performance Spatial Hardware for Dense Tensor Computations

SpComp: A Sparsity Structure-Specific Compilation of Matrix Operations

Minimum Cost Loop Nests for Contraction of a Sparse Tensor with a Tensor Network

ReACT: Redundancy-Aware Code Generation for Tensor Expressions.

Distributed-Memory Parallel Algorithms for Sparse Matrix and Sparse Tall-and-Skinny Matrix Multiplication