Abstract:Recently, there has been a significant focus on exploring the theoretical aspects of deep learning, especially regarding its performance in classification tasks. Bayesian deep learning has emerged as a unified probabilistic framework, seeking to integrate deep learning with Bayesian methodologies seamlessly. However, there exists a gap in the theoretical understanding of Bayesian approaches in deep learning for classification. This study presents an attempt to bridge that gap. By leveraging PAC-Bayes bounds techniques, we present theoretical results on the prediction or misclassification error of a probabilistic approach utilizing Spike-and-Slab priors for sparse deep learning in classification. We establish non-asymptotic results for the prediction error. Additionally, we demonstrate that, by considering different architectures, our results can achieve minimax optimal rates in both low and high-dimensional settings, up to a logarithmic factor. Moreover, our additional logarithmic term yields slight improvements over previous works. Additionally, we propose and analyze an automated model selection approach aimed at optimally choosing a network architecture with guaranteed optimality.

What problem does this paper attempt to address?

The main focus of this paper is to explore the theoretical foundation of deep learning in classification tasks, particularly the misclassification rate bounds for Bayesian sparse deep learning. The authors utilize PAC-Bayesian bounds techniques and propose a random probability method based on margin loss (hinge loss), which adopts Spike-and-Slab prior to promote sparsity of network parameters. The paper proves the relationship between the proposed predictive error bounds and the best possible error (ideal Bayesian error), and demonstrates that this method can achieve near-optimal rates of minimum power optimality under low and high-dimensional settings, with at most a logarithmic factor difference across different architectures. Under the low noise assumption, the paper provides two theorems (Theorem 1 and Theorem 3) that respectively provide predictive error bounds for both slow and fast learning rates. These bounds imply that the proposed method can achieve close-to-optimal classification performance even in high or low-dimensional scenarios. Moreover, the paper introduces an automatic model selection method aiming to optimize the selection of network architecture to ensure optimal performance. The main contributions of the paper include: 1. Providing non-asymptotic predictive error bounds for deep neural network classifiers, which are applicable to different dimensions and network architectures. 2. Demonstrating the relationship between the predictive error rates of the proposed method and the optimal error for specific architectures. 3. Introducing an automatic model selection strategy to adapt to different complexity requirements. The paper concludes by citing a series of related works, which provide background and comparison for theoretical analysis and performance evaluation of deep learning. Through these theoretical results, researchers and practitioners can better understand the performance of deep learning in classification tasks and optimize its application.

Misclassification bounds for PAC-Bayesian sparse deep learning

Misclassification excess risk bounds for PAC-Bayesian classification via convexified loss

Sparse Bayesian Approach to Fast Learning Network for Multiclassification.

Minimax optimality of deep neural networks on dependent data via PAC-Bayes bounds

On high-dimensional classification by sparse generalized Bayesian logistic regression

A sparse PAC-Bayesian approach for high-dimensional quantile prediction

De-randomized PAC-Bayes Margin Bounds: Applications to Non-convex and Non-smooth Predictors

Improving Generalization of Complex Models under Unbounded Loss Using PAC-Bayes Bounds

Self-Certifying Classification by Linearized Deep Assignment

Layer adaptive node selection in Bayesian neural networks: Statistical guarantees and implementation details

A PAC-Bayesian Link Between Generalisation and Flat Minima

Rethinking Bayesian Learning for Data Analysis: The Art of Prior and Inference in Sparsity-Aware Modeling

Conditionally Gaussian PAC-Bayes

Posterior Concentration for Sparse Deep Learning

A General Framework for the Practical Disintegration of PAC-Bayesian Bounds

High-dimensional prediction for count response via sparse exponential weights

Spike-and-slab shrinkage priors for structurally sparse Bayesian neural networks

High Dimensional Bayesian Network Classification with Network Global-Local Shrinkage Priors

The Case for Bayesian Deep Learning

PAC-Bayesian Theory Meets Bayesian Inference

Higher-Order Generalization Bounds: Learning Deep Probabilistic Programs via PAC-Bayes Objectives