Butterfly factorization with error guarantees

Quoc-Tung Le,Rémi Gribonval,Elisa Riccietti,Léon Zheng
2024-11-07
Abstract:In this paper, we investigate the butterfly factorization problem, i.e., the problem of approximating a matrix by a product of sparse and structured factors. We propose a new formal mathematical description of such factors, that encompasses many different variations of butterfly factorization with different choices of the prescribed sparsity patterns. Among these supports, we identify those that ensure that the factorization problem admits an optimum, thanks to a new property called "chainability". For those supports we propose a new butterfly algorithm that yields an approximate solution to the butterfly factorization problem and that is supported by stronger theoretical guarantees than existing factorization methods. Specifically, we show that the ratio of the approximation error by the minimum value is bounded by a constant, independent of the target matrix.
Optimization and Control
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is the butterfly factorization problem of matrices, that is, how to approximately represent a matrix as the product of a series of sparse and structured factors. Specifically, the authors propose a new mathematical description method to define these factors and explore which support structures can ensure the existence of an optimal solution to the decomposition problem. In addition, they also propose a new butterfly algorithm, which theoretically provides a stronger error guarantee than existing methods. ### Specific Problems and Goals 1. **Problem Description**: - The problem can be formalized as: given a target matrix \(A\), find a series of sparse factors \(X_1, X_2, \ldots, X_L\) such that \(A\) can be approximated as \(\hat{A} = X_1X_2\cdots X_L\). - The non - zero element positions of each factor \(X_{\ell}\) are restricted by a predefined support matrix \(S_{\ell}\), that is, \(\text{supp}(X_{\ell})\subseteq\text{supp}(S_{\ell})\). 2. **Optimization Goals**: - Minimize the approximation error in the Frobenius norm: \[ E_{\beta}(A):=\inf_{(X_{\ell})_{\ell = 1}^L}\|A - X_1X_2\cdots X_L\|_F \] - Where \(B\) is a butterfly matrix, and each \(X_{\ell}\) is a factor with a sparse pattern prescribed by \(\pi_{\ell}\). ### Main Contributions 1. **Introduction of the Mathematical Description of "Kronecker - sparse Factors"**: - A new mathematical formula is proposed to describe deformable butterfly factors and restrict them to the form of Kronecker products. 2. **Definition of Chainability**: - Chainability is a stability property that ensures the product of multiple Kronecker - sparse factors is still a Kronecker - sparse factor. It is proved that when the architecture \(\beta\) has chainability, the problem (1.1) has an optimal solution (Theorem 7.8). 3. **Analysis of the Set of Butterfly Matrices**: - For the architecture \(\beta\) with chainability, the set of butterfly matrices \(B_{\beta}\) is characterized from the perspective of the low - rank property, which is equivalent to the generalization of the complementary low - rank property. 4. **Definition of Redundancy**: - If a chained architecture \(\beta\) is redundant, it can be replaced by a compressed (non - redundant) architecture \(\beta'\) such that \(B_{\beta}=B_{\beta'}\). Redundant architectures have no practical significance in accelerating linear operations. 5. **Proposal of a New Butterfly Algorithm**: - A new butterfly algorithm (Algorithm 6.1) is proposed, which can provide approximate solutions for non - redundant chained architectures and introduces a new orthogonalization step to obtain error guarantees. This algorithm can be extended to redundant chained architectures and maintain the same theoretical guarantees. 6. **Error Bounds**: - It is proved that for the chained architecture \(\beta\), the butterfly factors \((\hat{X}_{\ell})_{\ell = 1}^L\) output by the new algorithm satisfy the following error bounds: \[ \|A-\hat{X}_1\hat{X}_2\cdots\hat{X}_L\|_F\leq C_{\beta}\cdot\inf_{(X_{\ell})_{\ell = 1}^L}\|A - X_1X_ \]