What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to conduct effective learning through Empirical Risk Minimization (ERM) in high - dimensional data classification tasks when data points come from a mixture model with heavy - tailed distributions. Specifically, the paper focuses on how to classify two data clouds (each cloud is generated by a Gaussian distribution with a central vector \(\mu\) and a random variance \(\Delta\)) in the high - dimensional limit when the number of samples \(n\) and the data dimension \(d\) both tend to infinity and their ratio \(\alpha = n/d\) remains fixed. ### Main problems 1. **Influence of non - Gaussian data distributions**: Traditional high - dimensional classification research usually assumes that data points follow a Gaussian distribution or a Gaussian mixture distribution. However, actual data often contains structural features and heavy - tailed distributions, and these features may have an important impact on the learning process. Therefore, the paper attempts to explore the impact of non - Gaussian data distributions (especially heavy - tailed distributions) on classification performance. 2. **Role of regularization**: The paper also studies the role of regularization in non - Gaussian data classification. Specifically, the paper analyzes the impact of different regularization intensities on classification performance and compares the results with those in the Gaussian data case. 3. **Separability threshold**: The paper explores when the data set becomes linearly inseparable under non - Gaussian data distributions. This involves determining a critical sample complexity \(\alpha^*\), below which the data set can be perfectly linearly separated. ### Research methods - **Superstatistical method**: The paper adopts a "superstatistical" method, that is, superimposing a random distribution of variances on the basis of the Gaussian distribution. This construction allows researchers to consider a large class of non - Gaussian distributions, including power - law distributions and Cauchy distributions, etc. - **Replica method**: Use the replica method to derive the asymptotic characteristics of the Empirical Risk Minimization estimator. This method is widely used in statistical physics to deal with complex optimization problems. ### Main contributions 1. **Asymptotic analysis**: The paper provides the asymptotic characteristics of the Empirical Risk Minimization estimator for classification tasks on non - Gaussian mixture models in the high - dimensional limit. These results cover not only covariates with infinite variances but also any convex loss functions and convex regularizations. 2. **Performance analysis**: Through different convex loss functions (such as quadratic loss and logistic loss) and ridge regularization, the paper analyzes the performance of classification tasks. In particular, for two balanced non - Gaussian distribution clusters, the optimal ridge regularization intensity \(\lambda^*\) is finite, which is in contrast to \(\lambda^*\to\infty\) in the Gaussian case. 3. **Separability threshold**: The paper derives the separability threshold \(\alpha^*\) of the data set under a large class of non - Gaussian data distributions. This result generalizes the known asymptotic properties of Gaussian cloud separability. 4. **Bayesian optimal performance**: Under certain moment conditions, the paper derives the Bayesian optimal performance of binary classification tasks in the case of symmetric central points. ### Experimental verification The paper verifies the accuracy of theoretical predictions through numerical experiments. The experimental results show that for different shape parameters \(a\) and sample complexity \(\alpha\), the theoretical predictions are highly consistent with the numerical experimental results. In particular, under non - Gaussian data distributions, the classification performance is significantly different from the results under Gaussian data distributions, thus verifying the failure of the "Gaussian universality principle" under heavy - tailed distributions. In summary, by introducing the superstatistical method, this paper systematically studies the impact of non - Gaussian data distributions on high - dimensional classification tasks and provides a new perspective for understanding machine - learning behaviors on complex data sets.

Classification of Heavy-tailed Features in High Dimensions: a Superstatistical Approach

Robust Estimation and Shrinkage in Ultrahigh Dimensional Expectile Regression with Heavy Tails and Variance Heterogeneity

Semiparametric Expectile Regression for High-dimensional Heavy-tailed and Heterogeneous Data

Statistical Inference in Classification of High-dimensional Gaussian Mixture

High-dimensional robust regression under heavy-tailed data: Asymptotics and Universality

The Breakdown of Gaussian Universality in Classification of High-dimensional Mixtures

A new approach in two-dimensional heavy-tailed distributions

Clustering using skewed multivariate heavy tailed distributions with flexible tail behaviour

High dimensional gaussian classification

Large deviations for a class of multivariate heavy-tailed risk processes used in insurance and finance

Nonparametric Mean and Variance Adaptive Classification Rule for High-Dimensional Data with Heteroscedastic Variances

An Efficient and Versatile Variational Method for High-Dimensional Data Classification

High-dimensional logistic entropy clustering

Universality of max-margin classifiers

High‐dimensional classification based on nonparametric maximum likelihood estimation under unknown and inhomogeneous variances

Heavy-Tailed Processes for Selective Shrinkage

Flexible Clustering with a Sparse Mixture of Generalized Hyperbolic Distributions

Nonparametric Bayes Classification via Learning of Affine Subspaces

Mean estimation and regression under heavy-tailed distributions--a survey

Classifying Overlapping Gaussian Mixtures in High Dimensions: From Optimal Classifiers to Neural Nets

Central limit theorems for high dimensional dependent data