Abstract:This work addresses the problem of high-dimensional classification by exploring the generalized Bayesian logistic regression method under a sparsity-inducing prior distribution. The method involves utilizing a fractional power of the likelihood resulting the fractional posterior. Our study yields concentration results for the fractional posterior, not only on the joint distribution of the predictor and response variable but also for the regression coefficients. Significantly, we derive novel findings concerning misclassification excess risk bounds using sparse generalized Bayesian logistic regression. These results parallel recent findings for penalized methods in the frequentist literature. Furthermore, we extend our results to the scenario of model misspecification, which is of critical importance.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to use the sparse generalized Bayesian logistic regression method to improve classification performance in high - dimensional classification. Specifically, the paper focuses on how, in the case where the data dimension is much larger than the sample size, to introduce a sparse prior distribution and use the fractional posterior to achieve effective classification, and derive results similar to those in the frequentist literature, especially in terms of misclassification excess risk bounds. ### Main Contributions 1. **Concentration Properties of the Fractional Posterior**: The paper provides concentration results of the fractional posterior under different metrics, including α - Rényi divergence, Hellinger distance, and total variation distance. These results are applicable not only to the joint distribution but also to the distribution of regression coefficients. 2. **Misclassification Excess Risk Bounds**: The paper derives the misclassification excess risk bounds of sparse generalized Bayesian logistic regression in high - dimensional classification, and these results are comparable to those in the frequentist literature. 3. **Extension in the Case of Model Misspecification**: The paper further explores the concentration properties of the fractional posterior and misclassification excess risk bounds in the case of model misspecification. ### Methods - **Fractional Posterior**: Construct the fractional posterior by using the fractional power of the likelihood function, which helps to deal with the problem of model misspecification. - **Sparse Prior**: Use a heavy - tailed distribution (such as the scaled Student's t - distribution) as a prior to induce sparsity. - **Technical Tools**: Utilize technical tools such as PAC - Bayesian inequalities to derive concentration rates. ### Results - **Concentration Results**: The paper proves the concentration results of the fractional posterior under α - Rényi divergence, and under certain conditions, these results can be transformed into concentration results under Hellinger distance and total variation distance. - **Misclassification Excess Risk**: The paper derives the misclassification excess risk bounds of sparse generalized Bayesian logistic regression in high - dimensional classification, and these results are comparable to those in the frequentist literature. - **Model Misspecification**: The paper also explores the concentration results and misclassification excess risk bounds in the case of model misspecification. ### Discussion - **Computational Aspects**: Although the paper mainly focuses on theoretical properties, it also briefly discusses the computational aspects of the fractional posterior, especially the advantages of using the Langevin Monte Carlo method for sampling. - **Prior Selection**: The paper compares the effects of different priors (such as the scaled Student's t - distribution and the spike - and - slab prior), and points out that although the scaled Student's t - distribution is conducive to sparsity, it lacks variable selection ability. ### Conclusion By introducing the sparse generalized Bayesian logistic regression method, the paper successfully addresses the challenges in high - dimensional classification, especially achieving results comparable to those in the frequentist literature in terms of misclassification excess risk bounds. These results are not only theoretically significant but also provide a new perspective for practical applications.

On high-dimensional classification by sparse generalized Bayesian logistic regression

High-dimensional classification by sparse logistic regression

High-dimensional prediction for count response via sparse exponential weights

Empirical Bayes inference in sparse high-dimensional generalized linear models

High‐dimensional sparse classification using exponential weighting with empirical hinge loss

A sparse PAC-Bayesian approach for high-dimensional quantile prediction

Bayesian High-dimensional Linear Regression with Sparse Projection-posterior

Adaptive posterior concentration rates for sparse high-dimensional linear regression with random design and unknown error variance

Misclassification bounds for PAC-Bayesian sparse deep learning

On properties of fractional posterior in generalized reduced-rank regression

High-dimensional Grouped-regression using Bayesian Sparse Projection-posterior

Bayesian Analysis for Over-parameterized Linear Model without Sparsity

Fully Bayesian logistic regression with hyper-LASSO priors for high-dimensional feature selection

High Dimensional Classification with combined Adaptive Sparse PLS and Logistic Regression

Robust adaptive LASSO in high-dimensional logistic regression

Concentration of a Sparse Bayesian Model With Horseshoe Prior in Estimating High‐Dimensional Precision Matrix

Tail-adaptive Bayesian shrinkage

A generalized Bayesian approach for high-dimensional robust regression with serially correlated errors and predictors

Confidence regions for high-dimensional generalized linear models under sparsity

Concentration of a sparse Bayesian model with Horseshoe prior in estimating high-dimensional precision matrix