Abstract:Tensor algebra is a crucial component for data-intensive workloads such as machine learning and scientific computing. As the complexity of data grows, scientists often encounter a dilemma between the highly specialized dense tensor algebra and efficient structure-aware algorithms provided by sparse tensor algebra. In this paper, we introduce DASTAC, a framework to propagate the tensors's captured high-level structure down to low-level code generation by incorporating techniques such as automatic data layout compression, polyhedral analysis, and affine code generation. Our methodology reduces memory footprint by automatically detecting the best data layout, heavily benefits from polyhedral optimizations, leverages further optimizations, and enables parallelization through MLIR. Through extensive experimentation, we show that DASTAC achieves 1 to 2 orders of magnitude speedup over TACO, a state-of-the-art sparse tensor compiler, and StructTensor, a state-of-the-art structured tensor algebra compiler, with a significantly lower memory footprint.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the trade - off problem between Dense Tensor Algebra and Sparse Tensor Algebra when dealing with complex data. Specifically: 1. **Limitations of Dense Tensor Algebra**: - Although the Dense Tensor Algebra framework supports rich compile - time optimizations (such as vectorization, blocking, and parallelization), due to its reliance on continuous memory access patterns, it is not always optimal for real - world data. - When data becomes complex and has a large number of zero or duplicate elements, Dense Tensor Algebra cannot fully utilize this structural information, resulting in high memory usage and poor performance. 2. **Limitations of Sparse Tensor Algebra**: - The Sparse Tensor Algebra framework improves algorithm efficiency and memory usage by exploiting the sparsity of data, but its irregular data structure makes the memory access pattern difficult to predict, thus making it difficult to fully utilize the hardware computing power. - The Sparse Tensor Algebra framework cannot achieve the same level of optimization as Dense Tensor Algebra. 3. **Deficiencies of Existing Methods**: - Current methods either focus on Dense Tensor Algebra or Sparse Tensor Algebra and fail to effectively combine the advantages of both. - Existing frameworks such as StructTensor can infer structural information, but have limitations in data layout compression and low - level code generation and cannot fully utilize structural information for efficient optimization. ### Solutions Proposed in the Paper To solve the above problems, the paper introduces the **DASTAC (Dense And Sparse Tensor Algebra Compiler)** framework. The main contributions of this framework include: 1. **Combining the Advantages of Sparse and Dense Tensor Algebra**: - DASTAC combines the efficient structure - aware algorithms of Sparse Tensor Algebra and the high - performance low - level code optimizations of Dense Tensor Algebra by propagating the high - level structural information of tensors to low - level code generation. 2. **Data Layout Compression**: - DASTAC introduces a new symbolic indexing algorithm that compresses the input structured tensors into a tightly packed vector representation, reducing the overall memory footprint and providing opportunities for vectorization. - Compared with traditional sparse data layouts (such as CSR, CSC, COO), DASTAC's method avoids indirect access and index storage overheads and achieves more efficient direct symbolic index calculations. 3. **Progressive Code Generation**: - DASTAC uses the intermediate languages (dialects) provided by the MLIR framework to gradually lower the code level and apply additional compiler optimizations, including code motion, common sub - expression elimination, and parallelization. - Through the Affine Dialect of MLIR, DASTAC generates structure - aware low - level code, achieving effective optimization at different levels of abstraction. 4. **Experimental Verification**: - Through extensive experiments, DASTAC shows a speed improvement of 1 to 2 orders of magnitude over TACO (the state - of - the - art sparse tensor compiler) and StructTensor (the state - of - the - art structured tensor algebra compiler) in both sequential and multithreaded scenarios, and significantly reduces memory usage. In summary, DASTAC provides an efficient and flexible tensor algebra computing framework by combining the advantages of sparse and dense tensor algebras, solving the limitations of existing methods in dealing with complex data.

Compressing Structured Tensor Algebra

The tensor algebra compiler

Automatic generation of efficient sparse tensor format conversion routines

Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition

SySTeC: A Symmetric Sparse Tensor Compiler

SparseAuto: An Auto-Scheduler for Sparse Tensor Computations Using Recursive Loop Nest Restructuring

Software for Sparse Tensor Decomposition on Emerging Computing Architectures

Sgap: Towards Efficient Sparse Tensor Algebra Compilation for GPU

T2S-Tensor: Productively Generating High-Performance Spatial Hardware for Dense Tensor Computations

FLAASH: Flexible Accelerator Architecture for Sparse High-Order Tensor Contraction

Polyhedral Specification and Code Generation of Sparse Tensor Contraction with Co-iteration

Compilation of Modular and General Sparse Workspaces

Optimizing Tensor Programs on Flexible Storage

Tailors: Accelerating Sparse Tensor Algebra by Overbooking Buffer Capacity

SpDISTAL: Compiling Distributed Sparse Tensor Computations

TSTC: Two-Level Sparsity Tensor Core Enabling Both Algorithm Flexibility and Hardware Efficiency

Efficient Processing of Sparse Tensor Decomposition via Unified Abstraction and PE-Interactive Architecture

Automatic Compiler-Based Data Structure Generation

SpComp: A Sparsity Structure-Specific Compilation of Matrix Operations

ReACT: Redundancy-Aware Code Generation for Tensor Expressions.

FreeTensor: A Free-Form DSL with Holistic Optimizations for Irregular Tensor Programs