Compressing Structured Tensor Algebra

Mahdi Ghorbani,Emilien Bauer,Tobias Grosser,Amir Shaikhha
2024-07-19
Abstract:Tensor algebra is a crucial component for data-intensive workloads such as machine learning and scientific computing. As the complexity of data grows, scientists often encounter a dilemma between the highly specialized dense tensor algebra and efficient structure-aware algorithms provided by sparse tensor algebra. In this paper, we introduce DASTAC, a framework to propagate the tensors's captured high-level structure down to low-level code generation by incorporating techniques such as automatic data layout compression, polyhedral analysis, and affine code generation. Our methodology reduces memory footprint by automatically detecting the best data layout, heavily benefits from polyhedral optimizations, leverages further optimizations, and enables parallelization through MLIR. Through extensive experimentation, we show that DASTAC achieves 1 to 2 orders of magnitude speedup over TACO, a state-of-the-art sparse tensor compiler, and StructTensor, a state-of-the-art structured tensor algebra compiler, with a significantly lower memory footprint.
Programming Languages,Machine Learning,Mathematical Software
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the trade - off problem between Dense Tensor Algebra and Sparse Tensor Algebra when dealing with complex data. Specifically: 1. **Limitations of Dense Tensor Algebra**: - Although the Dense Tensor Algebra framework supports rich compile - time optimizations (such as vectorization, blocking, and parallelization), due to its reliance on continuous memory access patterns, it is not always optimal for real - world data. - When data becomes complex and has a large number of zero or duplicate elements, Dense Tensor Algebra cannot fully utilize this structural information, resulting in high memory usage and poor performance. 2. **Limitations of Sparse Tensor Algebra**: - The Sparse Tensor Algebra framework improves algorithm efficiency and memory usage by exploiting the sparsity of data, but its irregular data structure makes the memory access pattern difficult to predict, thus making it difficult to fully utilize the hardware computing power. - The Sparse Tensor Algebra framework cannot achieve the same level of optimization as Dense Tensor Algebra. 3. **Deficiencies of Existing Methods**: - Current methods either focus on Dense Tensor Algebra or Sparse Tensor Algebra and fail to effectively combine the advantages of both. - Existing frameworks such as StructTensor can infer structural information, but have limitations in data layout compression and low - level code generation and cannot fully utilize structural information for efficient optimization. ### Solutions Proposed in the Paper To solve the above problems, the paper introduces the **DASTAC (Dense And Sparse Tensor Algebra Compiler)** framework. The main contributions of this framework include: 1. **Combining the Advantages of Sparse and Dense Tensor Algebra**: - DASTAC combines the efficient structure - aware algorithms of Sparse Tensor Algebra and the high - performance low - level code optimizations of Dense Tensor Algebra by propagating the high - level structural information of tensors to low - level code generation. 2. **Data Layout Compression**: - DASTAC introduces a new symbolic indexing algorithm that compresses the input structured tensors into a tightly packed vector representation, reducing the overall memory footprint and providing opportunities for vectorization. - Compared with traditional sparse data layouts (such as CSR, CSC, COO), DASTAC's method avoids indirect access and index storage overheads and achieves more efficient direct symbolic index calculations. 3. **Progressive Code Generation**: - DASTAC uses the intermediate languages (dialects) provided by the MLIR framework to gradually lower the code level and apply additional compiler optimizations, including code motion, common sub - expression elimination, and parallelization. - Through the Affine Dialect of MLIR, DASTAC generates structure - aware low - level code, achieving effective optimization at different levels of abstraction. 4. **Experimental Verification**: - Through extensive experiments, DASTAC shows a speed improvement of 1 to 2 orders of magnitude over TACO (the state - of - the - art sparse tensor compiler) and StructTensor (the state - of - the - art structured tensor algebra compiler) in both sequential and multithreaded scenarios, and significantly reduces memory usage. In summary, DASTAC provides an efficient and flexible tensor algebra computing framework by combining the advantages of sparse and dense tensor algebras, solving the limitations of existing methods in dealing with complex data.