Abstract:Randomized trace estimation is a popular and well studied technique that approximates the trace of a large-scale matrix $B$ by computing the average of $x^T Bx$ for many samples of a random vector $X$. Often, $B$ is symmetric positive definite (SPD) but a number of applications give rise to indefinite $B$. Most notably, this is the case for log-determinant estimation, a task that features prominently in statistical learning, for instance in maximum likelihood estimation for Gaussian process regression. The analysis of randomized trace estimates, including tail bounds, has mostly focused on the SPD case. In this work, we derive new tail bounds for randomized trace estimates applied to indefinite $B$ with Rademacher or Gaussian random vectors. These bounds significantly improve existing results for indefinite $B$, reducing the the number of required samples by a factor $n$ or even more, where $n$ is the size of $B$. Even for an SPD matrix, our work improves an existing result by Roosta-Khorasani and Ascher for Rademacher vectors. This work also analyzes the combination of randomized trace estimates with the Lanczos method for approximating the trace of $f(A)$. Particular attention is paid to the matrix logarithm, which is needed for log-determinant estimation. We improve and extend an existing result, to not only cover Rademacher but also Gaussian random vectors.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to provide more effective tail bounds for randomized trace estimates of indefinite matrices and apply them to determinant calculations**. Specifically, the paper mainly focuses on the following points:
1. **Improving existing tail bounds**: For an indefinite matrix \(B\), the existing tail bounds are usually only applicable to symmetric positive definite (SPD) matrices. By introducing new techniques, such as Rademacher and Gaussian random vectors, this paper significantly improves these bounds and reduces the number of samples required.
2. **Dealing with the particularities of indefinite matrices**: When the matrix \(B\) is an indefinite matrix, directly applying the existing SPD matrix methods will lead to problems. For example, when calculating the determinant, even if \(A\) is an SPD matrix, \(B = \log(A)\) may be an indefinite matrix. The method proposed in this paper can effectively handle such complex situations.
3. **Combining with the Lanczos method**: In order to further improve the accuracy, this paper also analyzes the effect of combining the randomized trace estimate with the Lanczos method to approximate the quadratic form \(x^T f(A)x\). Especially for the matrix logarithm, which is a necessary step in calculating the determinant.
### Formula summary
- **Trace estimation formula**:
\[
\text{tr}_N(B) := \frac{1}{N} \sum_{i = 1}^N (X^{(i)})^T B X^{(i)}
\]
where \(X^{(i)}\) are independent random vectors.
- **Relationship between determinant and trace**:
\[
\log(\det(A))=\text{tr}(\log(A))
\]
- **Tail bounds**:
For Gaussian random vectors:
\[
P\left( \left| \text{tr}_G^N(B)-\text{tr}(B) \right| \geq \varepsilon \right) \leq 2 \exp\left( -\frac{N\varepsilon^2}{4 \|B\|_F^2 + 4\varepsilon \|B\|_2} \right)
\]
For Rademacher random vectors:
\[
P\left( \left| \text{tr}_R^N(B)-\text{tr}(B) \right| \geq \varepsilon \right) \leq 2 \exp\left( -\frac{N\varepsilon^2}{8 \|B - D_B\|_F^2 + 8\varepsilon \|B - D_B\|_2} \right)
\]
### Application background
These problems are of great significance in fields such as statistical learning, maximum likelihood estimation, Gaussian process regression, and lattice quantum chromodynamics. Especially in applications that require efficient estimation of the determinant or trace of large matrices, the improved randomized trace estimation method can significantly reduce the computational cost and improve the accuracy.
Through these improvements, the paper provides more powerful tools for dealing with indefinite matrices and expands the application scope of existing methods.