Abstract:We prove that there is a universal constant $C>0$ so that for every $d \in \mathbb N$, every centered subgaussian distribution $\mathcal D$ on $\mathbb R^d$, and every even $p \in \mathbb N$, the $d$-variate polynomial $(Cp)^{p/2} \cdot \|v\|_{2}^p - \mathbb E_{X \sim \mathcal D} \langle v,X\rangle^p$ is a sum of square polynomials. This establishes that every subgaussian distribution is \emph{SoS-certifiably subgaussian} -- a condition that yields efficient learning algorithms for a wide variety of high-dimensional statistical tasks. As a direct corollary, we obtain computationally efficient algorithms with near-optimal guarantees for the following tasks, when given samples from an arbitrary subgaussian distribution: robust mean estimation, list-decodable mean estimation, clustering mean-separated mixture models, robust covariance-aware mean estimation, robust covariance estimation, and robust linear regression. Our proof makes essential use of Talagrand's generic chaining/majorizing measures theorem.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve how to prove that all subgaussian distributions have verifiable subgaussian properties, and then derive a series of efficient algorithm applications. Specifically, the authors attempt to answer the following core question: **Question 1.5: Can we characterize all verifiable subgaussian distributions?** #### Background and Motivation In robust statistics, it is an important goal to design computationally efficient estimators to achieve approximately optimal accuracy in the presence of a large amount of contaminated data. A typical problem is robust mean estimation: given a set of data points $S$ and a contamination parameter $\epsilon$, approximately $(1 - \epsilon)$ of the data points come from an unknown distribution $P$, and the remaining $\epsilon$ data points may be arbitrarily or adversarially chosen. The goal is to estimate the mean $\mu$ of the unknown distribution $P$. For Gaussian distributions, previous studies have shown that polynomial - time algorithms can be designed to achieve an accuracy of $\tilde{O}(\epsilon)$ within the $\ell_2$ error range. However, the Gaussian assumption is often not sufficient to accurately model many practical application scenarios. Therefore, researchers turn to more general distribution families, such as subgaussian distributions. Subgaussian distributions are a widely studied non - parametric distribution family, and the tail probability of their linear projections decays at least as fast as that of Gaussian distributions. Theoretically, for any subgaussian distribution, the mean can be robustly estimated within an error range of $\tilde{O}(\epsilon)$. However, previous work has shown that for robust mean estimation of general subgaussian distributions, the best - known error guarantee is $O(\epsilon^{1/2})$, which is not as ideal as that of Gaussian distributions. #### Main Contributions The authors solve the above problems by proving that all subgaussian distributions are verifiable subgaussian distributions. Specifically, they prove the following theorem: **Theorem 1.6 (Verifiability of Subgaussian Distributions)**: There exists a universal constant $C>0$ such that for any $s$-subgaussian random vector $X\sim P$ in $\mathbb{R}^d$, $P$ is $(Cs\sqrt{m}, m)$-verifiably bounded for any even number $m$. In particular, $P$ is $Cs$-verifiable subgaussian. This result implies that all subgaussian distributions have verifiable subgaussian properties and can be applied to a series of high - dimensional statistical tasks, including but not limited to: - **Robust Mean Estimation**: Subgaussian distributions under the $\ell_2$ norm. - **List - Decoding Mean Estimation**: Subgaussian distributions under the $\ell_2$ norm. - **Mixture Model Clustering**: Subgaussian distributions under the mean - separation assumption. - **Robust Covariance Estimation**: Hyper - contractive subgaussian distributions under the relative spectral norm. - **Robust Linear Regression**: Hyper - contractive subgaussian distributions under arbitrary noise. These results not only provide new theoretical insights but also have important significance in practical applications, especially when dealing with high - dimensional data, and can provide more efficient and robust algorithms. #### Technical Overview To prove Theorem 1.6, the authors use duality and Talagrand's generic chaining method. Through duality, they transform the problem into analyzing the expected upper bound of the empirical process. Then, by using the chaining method, they transform the nonlinear empirical process into a linear empirical process and apply the concentration inequality of the Gaussian process to complete the proof. In summary, this paper provides a new theoretical basis and an efficient algorithm framework for a series of high - dimensional statistical tasks by proving that all subgaussian distributions are verifiable subgaussian distributions.

SoS Certifiability of Subgaussian Distributions and its Algorithmic Applications

Sum-of-Squares & Gaussian Processes I: Certification

Efficient Certificates of Anti-Concentration Beyond Gaussians

Sum-of-squares lower bounds for Non-Gaussian Component Analysis

Fourier sum of squares certificates

Approximability and proof complexity

Private Algorithms for Stochastic Saddle Points and Variational Inequalities: Beyond Euclidean Geometry

Certifying Euclidean Sections and Finding Planted Sparse Vectors Beyond the $\sqrt{n}$ Dimension Threshold

Certification of Real Inequalities -- Templates and Sums of Squares

Data-Driven Distributionally Robust Safety Verification Using Barrier Certificates and Conditional Mean Embeddings

Nonlinear Random Matrices and Applications to the Sum of Squares Hierarchy

A unified approach to quantum de Finetti theorems and SoS rounding via geometric quantization

Machinery for Proving Sum-of-Squares Lower Bounds on Certification Problems

The power of sum-of-squares for detecting hidden structures

Certification of Distributional Individual Fairness

Efficient Statistics With Unknown Truncation, Polynomial Time Algorithms, Beyond Gaussians

Weak Poincaré Inequalities, Simulated Annealing, and Sampling from Spherical Spin Glasses

Robust Sparse Mean Estimation via Sum of Squares

Approximation of optimization problems with constraints through kernel Sum-Of-Squares

ProSub: Probabilistic Open-Set Semi-Supervised Learning with Subspace-Based Out-of-Distribution Detection