Ahmad Abdelfattah,Willow Ahrens,Hartwig Anzt,Chris Armstrong,Ben Brock,Aydin Buluc,Federico Busato,Terry Cojean,Tim Davis,Jim Demmel,Grace Dinh,David Gardener,Jan Fiala,Mark Gates,Azzam Haider,Toshiyuki Imamura,Pedro Valero Lara,Jose Moreira,Sherry Li,Piotr Luszczek,Max Melichenko,Jose Moeira,Yvan Mokwinski,Riley Murray,Spencer Patty,Slaven Peles,Tobias Ribizel,Jason Riedy,Siva Rajamanickam,Piyush Sao,Manu Shantharam,Keita Teranishi,Stan Tomov,Yu-Hsiang Tsai,Heiko Weichelt

Abstract:The standardization of an interface for dense linear algebra operations in the BLAS standard has enabled interoperability between different linear algebra libraries, thereby boosting the success of scientific computing, in particular in scientific HPC. Despite numerous efforts in the past, the community has not yet agreed on a standardization for sparse linear algebra operations due to numerous reasons. One is the fact that sparse linear algebra objects allow for many different storage formats, and different hardware may favor different storage formats. This makes the definition of a FORTRAN-style all-circumventing interface extremely challenging. Another reason is that opposed to dense linear algebra functionality, in sparse linear algebra, the size of the sparse data structure for the operation result is not always known prior to the information. Furthermore, as opposed to the standardization effort for dense linear algebra, we are late in the technology readiness cycle, and many production-ready software libraries using sparse linear algebra routines have implemented and committed to their own sparse BLAS interface. At the same time, there exists a demand for standardization that would improve interoperability, and sustainability, and allow for easier integration of building blocks. In an inclusive, cross-institutional effort involving numerous academic institutions, US National Labs, and industry, we spent two years designing a hardware-portable interface for basic sparse linear algebra functionality that serves the user needs and is compatible with the different interfaces currently used by different vendors. In this paper, we present a C++ API for sparse linear algebra functionality, discuss the design choices, and detail how software developers preserve a lot of freedom in terms of how to implement functionality behind this API.

Lasa: Abstraction and Specialization for Productive and Performant Linear Algebra on FPGAs

Automatically harnessing sparse acceleration

LAW: A Tool for Improved Productivity with High-Performance Linear Algebra Codes. Design and Applications

Sparse-HeteroCL: from Sparse Tensor Algebra to Highly Customized Accelerators on FPGAs.

BLASX: A High Performance Level-3 BLAS Library for Heterogeneous Multi-GPU Computing

Developing a BLAS library for the AMD AI Engine

Fast Matrix Multiplication via Compiler-only Layered Data Reorganization and Intrinsic Lowering

Extending High-Level Synthesis for Task-Parallel Programs

Interface for Sparse Linear Algebra Operations

High-level synthesis: productivity, performance, and software constraints

Optimizing the Performance of the Sparse Matrix-Vector Multiplication Kernel in FPGA Guided by the Roofline Model

FT-BLAS: A Fault Tolerant High Performance BLAS Implementation on x86 CPUs

Multi-Threaded Dense Linear Algebra Libraries for Low-Power Asymmetric Multicore Processors

SASA: A Scalable and Automatic Stencil Acceleration Framework for Optimized Hybrid Spatial and Temporal Parallelism on HBM-based FPGAs

The ELAPS Framework: Experimental Linear Algebra Performance Studies

HLSPilot: LLM-based High-Level Synthesis

Machine-Learning-Driven Runtime Optimization of BLAS Level 3 on Modern Multi-Core Systems

Studies on the synthesis of vitamin B12. I. Introduction and model studies.

Design Space Exploration of FPGA-based Accelerators with Multi-Level Parallelism.

Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity

Fortran High-Level Synthesis: Reducing the barriers to accelerating HPC codes on FPGAs