C. Li,A. Shkolnik
Abstract:Dimensionality reduction methods, such as principal component analysis (PCA) and factor analysis, are central to many problems in data science. There are, however, serious and well-understood challenges to finding robust low dimensional approximations for data with significant heteroskedastic noise. This paper introduces a relaxed version of Minimum Trace Factor Analysis (MTFA), a convex optimization method with roots dating back to the work of Ledermann in 1940. This relaxation is particularly effective at not overfitting to heteroskedastic perturbations and addresses the commonly cited Heywood cases in factor analysis and the recently identified "curse of ill-conditioning" for existing spectral methods. We provide theoretical guarantees on the accuracy of the resulting low rank subspace and the convergence rate of the proposed algorithm to compute that matrix. We develop a number of interesting connections to existing methods, including HeteroPCA, Lasso, and Soft-Impute, to fill an important gap in the already large literature on low rank matrix estimation. Numerical experiments benchmark our results against several recent proposals for dealing with heteroskedastic noise.
What problem does this paper attempt to address?
### Problems the paper attempts to solve
The paper "On Minimum Trace Factor Analysis: An Old Song Sung to a New Tune" aims to solve the problem of finding robust low - dimensional approximations in data with significant heteroscedastic noise. Specifically, the article introduces a relaxed version of the minimum - trace factor analysis (MTFA), which is a convex optimization method whose origin can be traced back to Ledermann's work in 1940. This relaxed version is particularly effective in avoiding over - fitting heteroscedastic perturbations and solves the Heywood case common in factor analysis as well as the recently identified "curse of ill - conditioning" problem in existing spectral methods.
### Main contributions
1. **Iterative fixed - point method and its convergence rate guarantee**:
- Proposes an iterative fixed - point method with a convergence rate guarantee, which is a relatively rare contribution in the factor analysis literature.
2. **Optimal subspace recovery guarantee**:
- Establishes an optimal subspace recovery guarantee, which is consistent with the minimax rate in the factor model setting.
3. **Overcoming the "curse of ill - conditioning"**:
- Existing subspace estimation methods experience performance degradation when the signal condition number is large. In contrast, the method in this paper does not require a condition number assumption by leveraging convex properties and has more relaxed conditions.
4. **Solving the Heywood case in factor analysis**:
- The method ensures that the Heywood case does not occur under appropriate regularization.
5. **Connections with existing literature**:
- The method in this paper establishes meaningful connections with existing methods such as HeteroPCA, LASSO, and Soft - Impute, filling the gaps in the field of low - rank matrix estimation.
### Mathematical optimization problems and their properties
This paper proposes the following optimization problem, given the covariance matrix \(\Sigma\) and the tuning parameter \(\tau>0\):
\[
\begin{aligned}
& \text{minimize} & & F(L, D):=\tau \|L\|_*+\frac{1}{2}\|\Sigma-(L + D)\|_F^2 \\
& \text{subject to} & & L\in S_p^+, \\
& & & D = P_{\text{diag}}(D).
\end{aligned}
\]
Here, the objective function \(F\) can be regarded as a proportional form of the Lagrangian function of the following constrained optimization problem:
\[
\begin{aligned}
& \text{minimize} & & \|L\|_* \\
& \text{subject to} & & \|\Sigma-(L + D)\|_F^2\leq\psi, \\
& & & L\in S_p^+, \\
& & & D = P_{\text{diag}}(D).
\end{aligned}
\]
In the optimization language, (4) is the dual problem and (5) is the primal problem. The equivalence relationship between the two is established in Lemma 27.
### Statistical guarantees
#### Factor model
Consider independent and identically distributed samples \(Y_1,\ldots,Y_n\) generated by the model:
\[
Y = X + Z\in\mathbb{R}^p
\]
where \(E(X)=\mu\), \(Var(X)=L\), \(E(Z) = 0\), \(Var(Z_i)=\omega_i^2\), \(Z=(Z_1,\ldots,Z_p)^\top\), and \(X, Z_1,\ldots,Z_p\) are mutually independent. In particular, the rank \(r\) of \(L\) is \(r\leq\phi(p)\) and has a spectral decomposition \(U\Lambda U^\top\).
\[
\Sigma := Var(Y)=L + D,\quad D:=\text{diag}(\omega_1^2,\ldo