Solving Large Rank-Deficient Linear Least-Squares Problems on Shared-Memory CPU Architectures and GPU Architectures

Mónica Chillarón,Gregorio Quintana-Ortí,Vicente Vidal,Per-Gunnar Martinsson
DOI: https://doi.org/10.48550/arXiv.2408.05238
2024-08-06
Abstract:Solving very large linear systems of equations is a key computational task in science and technology. In many cases, the coefficient matrix of the linear system is rank-deficient, leading to systems that may be underdetermined, inconsistent, or both. In such cases, one generally seeks to compute the least squares solution that minimizes the residual of the problem, which can be further defined as the solution with smallest norm in cases where the coefficient matrix has a nontrivial nullspace. This work presents several new techniques for solving least squares problems involving coefficient matrices that are so large that they do not fit in main memory. The implementations include both CPU and GPU variants. All techniques rely on complete orthogonal decompositions that guarantee that both conditions of a least squares solution are met, regardless of the rank properties of the matrix. Specifically, they rely on the recently proposed "randUTV" algorithm that is particularly effective in strongly communication-constrained environments. A detailed precision and performance study reveals that the new methods, that operate on data stored on disk, are competitive with state-of-the-art methods that store all data in main memory.
Distributed, Parallel, and Cluster Computing,Performance
What problem does this paper attempt to address?