Evolution of the SLATE linear algebra library

Mark Gates,Ahmad Abdelfattah,Kadir Akbudak,Mohammed Al Farhan,Rabab Alomairy,Daniel Bielich,Treece Burgess,Sébastien Cayrols,Neil Lindquist,Dalal Sukkari,Asim YarKhan
DOI: https://doi.org/10.1177/10943420241286531
2024-09-29
The International Journal of High Performance Computing Applications
Abstract:The International Journal of High Performance Computing Applications, Ahead of Print. SLATE (Software for Linear Algebra Targeting Exascale) is a distributed, dense linear algebra library targeting both CPU-only and GPU-accelerated systems, developed over the course of the Exascale Computing Project (ECP). While it began with several documents setting out its initial design, significant design changes occurred throughout its development. In some cases, these were anticipated: an early version used a simple consistency flag that was later replaced with a full-featured consistency protocol. In other cases, performance limitations and software and hardware changes prompted a redesign. Sequential communication tasks were parallelized; host-to-host MPI calls were replaced with GPU device-to-device MPI calls; more advanced algorithms such as Communication Avoiding LU and the Random Butterfly Transform (RBT) were introduced. Early choices that turned out to be cumbersome, error prone, or inflexible have been replaced with simpler, more intuitive, or more flexible designs. Applications have been a driving force, prompting a lighter weight queue class, nonuniform tile sizes, and more flexible MPI process grids. Of paramount importance has been building a portable library that works across several different GPU architectures – AMD, Intel, and NVIDIA – while keeping a clean and maintainable codebase. Here we explore the evolving design choices and their effects, both in terms of performance and software sustainability.
computer science, theory & methods, interdisciplinary applications, hardware & architecture
What problem does this paper attempt to address?