Abstract:In scientific fields such as quantum computing, physics, chemistry, and machine learning, high dimensional data are typically represented using sparse tensors. Tensor contraction is a popular operation on tensors to exploit meaning or alter the input tensors. Tensor contraction is, however, computationally expensive and grows quadratically with the number of elements. For this reason, specialized algorithms have been created to only operate on the nonzero elements. Current sparse tensor contraction algorithms utilize sub-optimal data structures that perform unnecessary computations which increase execution time and the overall time complexity. We propose Swift, a novel algorithm for sparse tensor contraction that replaces the costly sorting with more efficient grouping, utilizes better data structures to represent tensors, and employs more memory-friendly hash table implementation. Swift is evaluated against the state-of-the-art sparse tensor contraction algorithm, demonstrating up to 20x speedup in various test cases and being able to handle imbalanced input tensors significantly better.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve the computational efficiency problem of high - dimensional sparse tensor contraction operations in scientific applications, especially in fields such as quantum computing, physics, chemistry, and machine learning. Specifically: 1. **High computational complexity**: Traditional tensor contraction algorithms have a quadratic growth in computational complexity with the number of elements when dealing with sparse tensors, resulting in excessive execution time. 2. **Redundant computation**: Existing algorithms still perform unnecessary calculations on zero - value elements when dealing with sparse tensors, increasing the execution time and overall time complexity. 3. **Irregular memory access patterns**: During the sparse tensor contraction process, due to the irregular distribution of element coordinates, the memory access pattern is discontinuous, affecting cache performance. 4. **Unknown output tensor size**: When dealing with sparse tensor contraction, it is impossible to know the density of the output tensor in advance, making it difficult to optimize memory allocation and data structures. 5. **Large amount of intermediate data**: During the contraction process, a large number of intermediate results with the same coordinates will be generated, and these results need to be merged to generate the final output. To solve these problems, the paper proposes a new algorithm **Swift**, which improves the computational efficiency of sparse tensor contraction through the following improvements: - **Replace sorting with grouping**: Avoids the expensive sorting operation on input tensors and instead uses a more efficient grouping method to ensure that elements with the same free mode are stored adjacently in memory. - **Optimized data structure**: Uses a more efficient data structure to represent tensors, reducing unnecessary memory access and computation. - **Cache - friendly hash table implementation**: Adopts a probing hash table instead of a chaining hash table to reduce the memory access latency caused by pointer chasing. Through these improvements, the Swift algorithm shows a performance improvement of 2 to 20 times faster than the existing best algorithms in various test cases and can better handle unbalanced input tensors. ### Summary The core problem of the paper is to improve the computational efficiency of sparse tensor contraction operations, especially in application scenarios of high - dimensional data processing. The Swift algorithm significantly improves the performance and efficiency of sparse tensor contraction through optimizing the data structure and algorithm design in the input processing, contraction, and accumulation stages.

Swift: High-Performance Sparse Tensor Contraction for Scientific Applications

FLAASH: Flexible Accelerator Architecture for Sparse High-Order Tensor Contraction

Automatic transformation of irreducible representations for efficient contraction of tensors with cyclic group symmetry

Efficient Utilization of Multi-Threading Parallelism on Heterogeneous Systems for Sparse Tensor Contraction

A framework for load balancing of tensor contraction expressions via dynamic task partitioning

Fast Search of the Optimal Contraction Sequence in Tensor Networks

High-Performance Tensor Contraction without Transposition

GSpTC: High-Performance Sparse Tensor Contraction on CPU-GPU Heterogeneous Systems

Swift: Fast, Reliable, Loosely Coupled Parallel Computation

BCB-SpTC: An Efficient Sparse High-Dimensional Tensor Contraction Employing Tensor Core Acceleration

POSTER: Optimizing Sparse Tensor Contraction with Revisiting Hash Table Design.

Accelerating medical research using the swift workflow system.

Swift: Reliable and Low-Latency Data Processing at Cloud Scale

Efficient parallelization of tensor network contraction for simulating quantum computation

Automatic generation of efficient sparse tensor format conversion routines

Jet: Fast quantum circuit simulations with parallel task-based tensor-network contraction

Tailors: Accelerating Sparse Tensor Algebra by Overbooking Buffer Capacity

High-Performance Generalized Tensor Operations

Swift for TensorFlow: A portable, flexible platform for deep learning

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

T2S-Tensor: Productively Generating High-Performance Spatial Hardware for Dense Tensor Computations