Abstract:Deep learning has revolutionized the last decade, being at the forefront of extraordinary advances in a wide range of tasks including computer vision, natural language processing, and reinforcement learning, to name but a few. However, it is well-known that deep models trained via maximum likelihood estimation tend to be overconfident and give poorly-calibrated predictions. Bayesian deep learning attempts to address this by placing priors on the model parameters, which are then combined with a likelihood to perform posterior inference. Unfortunately, for deep models, the true posterior is intractable, forcing the user to resort to approximations. In this thesis, we explore the use of variational inference (VI) as an approximation, as it is unique in simultaneously approximating the posterior and providing a lower bound to the marginal likelihood. If tight enough, this lower bound can be used to optimize hyperparameters and to facilitate model selection. However, this capacity has rarely been used to its full extent for Bayesian neural networks, likely because the approximate posteriors typically used in practice can lack the flexibility to effectively bound the marginal likelihood. We therefore explore three aspects of Bayesian learning for deep models: 1) we ask whether it is necessary to perform inference over as many parameters as possible, or whether it is reasonable to treat many of them as optimizable hyperparameters; 2) we propose a variational posterior that provides a unified view of inference in Bayesian neural networks and deep Gaussian processes; 3) we demonstrate how VI can be improved in certain deep Gaussian process models by analytically removing symmetries from the posterior, and performing inference on Gram matrices instead of features. We hope that our contributions will provide a stepping stone to fully realize the promises of VI in the future.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the lack of effectiveness and flexibility in variational inference (VI) in deep Bayesian models. Specifically: 1. **Over - fitting and Pathological Behaviors**: The paper explores whether it is reasonable to perform partial Bayesian inference (i.e., only infer some parameters) in deep models, or whether Bayesian inference should be performed on all parameters as much as possible. The study found that partial Bayesian inference may lead to severe over - fitting and pathological path behaviors, so it is recommended to adopt the "fully Bayesian" method as much as possible. 2. **Flexibility of Variational Posterior**: The paper proposes a new form of variational posterior, which can uniformly handle the inference problems in Bayesian neural networks and deep Gaussian processes, and is flexible enough to take advantage of prior hyper - parameters. 3. **Symmetry Problem**: The paper further studies how to improve the effect of variational inference by analyzing and removing the symmetry of the posterior in the deep Gaussian process model. The specific method is to perform inference on the Gram matrix rather than on features. Through these studies, the author hopes to provide a basis for more fully realizing the potential of variational inference in the future, especially in model selection and hyper - parameter optimization.

Towards Improved Variational Inference for Deep Bayesian Models

Bayesian Inference and Deep Learning for Inverse Problems

Variational Inference for Bayesian Neural Networks under Model and Parameter Uncertainty

Deterministic Variational Inference for Robust Bayesian Neural Networks

Variational Inference on the Final-Layer Output of Neural Networks

Variational Inference: A Review for Statisticians

Amortized Variational Inference for Deep Gaussian Processes

Neural Variational Inference and Learning in Belief Networks

Variational Bayesian inference with stochastic search

Variational Gibbs Inference for Statistical Model Estimation from Incomplete Data

A Primer on Variational Inference for Physics-Informed Deep Generative Modelling

Neural Operator Variational Inference based on Regularized Stein Discrepancy for Deep Gaussian Processes

Variational Inference for Nonlinear Inverse Problems via Neural Net Kernels: Comparison to Bayesian Neural Networks, Application to Topology Optimization

Variational Learning of Bayesian Neural Networks Via Bayesian Dark Knowledge

Variational inference: uncertainty quantification in additive models

Variational Autoencoders for Efficient Simulation-Based Inference

Variational Inference for Uncertainty on the Inputs of Gaussian Process Models

Auto-Encoding Variational Bayes

On the Convergence of Extended Variational Inference for Non-Gaussian Statistical Models.

Adaptive variational Bayes: Optimality, computation and applications

Variational Bayesian Bow tie Neural Networks with Shrinkage