Abstract:There is a significant expansion in both volume and range of applications along with the concomitant increase in the variety of data sources. These ever-expanding trends have highlighted the necessity for more versatile analysis tools that offer greater opportunities for algorithmic developments and computationally faster operations than the standard flat-view matrix approach. Tensors, or multi-way arrays, provide such an algebraic framework which is naturally suited to data of such large volume, diversity, and veracity. Indeed, the associated tensor decompositions have demonstrated their potential in breaking the Curse of Dimensionality associated with traditional matrix methods, where a necessary exponential increase in data volume leads to adverse or even intractable consequences on computational complexity. A key tool underpinning multi-linear manipulation of tensors and tensor networks is the standard Tensor Contraction Product (TCP). However, depending on the dimensionality of the underlying tensors, the TCP also comes at the price of high computational complexity in tensor manipulation. In this work, we resort to diagrammatic tensor network manipulation to calculate such products in an efficient and computationally tractable manner, by making use of Tensor Train decomposition (TTD). This has rendered the underlying concepts easy to perceive, thereby enhancing intuition of the associated underlying operations, while preserving mathematical rigour. In addition to bypassing the cumbersome mathematical multi-linear expressions, the proposed Tensor Train Contraction Product model is shown to accelerate significantly the underlying computational operations, as it is independent of tensor order and linear in the tensor dimension, as opposed to performing the full computations through the standard approach (exponential in tensor order).

High-Performance Generalized Tensor Operations

High-Performance Tensor Contraction without Transposition

The tensor algebra compiler

T2S-Tensor: Productively Generating High-Performance Spatial Hardware for Dense Tensor Computations

Fast Matrix Multiplication via Compiler-only Layered Data Reorganization and Intrinsic Lowering

Software for Sparse Tensor Decomposition on Emerging Computing Architectures

Performance of linear solvers in tensor-train format on current multicore architectures

Automatic transformation of irreducible representations for efficient contraction of tensors with cyclic group symmetry

Performance Optimization for Sparse A(T)Ax in Parallel on Multicore Cpu

GTA: a new General Tensor Accelerator with Better Area Efficiency and Data Reuse

A framework for load balancing of tensor contraction expressions via dynamic task partitioning

ReACT: Redundancy-Aware Code Generation for Tensor Expressions.

BLAS-like Interface for Binary Tensor Contractions

High-Performance Tensor Learning Primitives Using GPU Tensor Cores

Reducing Computational Complexity of Tensor Contractions via Tensor-Train Networks

Code Generation and Performance Engineering for Matrix-Free Finite Element Methods on Hybrid Tetrahedral Grids

Sgap: Towards Efficient Sparse Tensor Algebra Compilation for GPU

FreeTensor: A Free-Form DSL with Holistic Optimizations for Irregular Tensor Programs

Optimizing Distributed Tensor Contractions using Node-Aware Processor Grids

Automatic generation of efficient sparse tensor format conversion routines

GTCO: Graph and Tensor Co-Design for Transformer-Based Image Recognition on Tensor Cores