Abstract:Deep learning has revolutionized the last decade, being at the forefront of extraordinary advances in a wide range of tasks including computer vision, natural language processing, and reinforcement learning, to name but a few. However, it is well-known that deep models trained via maximum likelihood estimation tend to be overconfident and give poorly-calibrated predictions. Bayesian deep learning attempts to address this by placing priors on the model parameters, which are then combined with a likelihood to perform posterior inference. Unfortunately, for deep models, the true posterior is intractable, forcing the user to resort to approximations. In this thesis, we explore the use of variational inference (VI) as an approximation, as it is unique in simultaneously approximating the posterior and providing a lower bound to the marginal likelihood. If tight enough, this lower bound can be used to optimize hyperparameters and to facilitate model selection. However, this capacity has rarely been used to its full extent for Bayesian neural networks, likely because the approximate posteriors typically used in practice can lack the flexibility to effectively bound the marginal likelihood. We therefore explore three aspects of Bayesian learning for deep models: 1) we ask whether it is necessary to perform inference over as many parameters as possible, or whether it is reasonable to treat many of them as optimizable hyperparameters; 2) we propose a variational posterior that provides a unified view of inference in Bayesian neural networks and deep Gaussian processes; 3) we demonstrate how VI can be improved in certain deep Gaussian process models by analytically removing symmetries from the posterior, and performing inference on Gram matrices instead of features. We hope that our contributions will provide a stepping stone to fully realize the promises of VI in the future.

Natural gradient hybrid variational inference with application to deep mixed models

Natural Gradient Hybrid Variational Inference with Application to Deep Mixed Models

A Unified Perspective on Natural Gradient Variational Inference with Gaussian Mixture Models

Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family Approximations

Natural Gradients in Practice: Non-Conjugate Variational Inference in Gaussian Process Models

Understanding Stochastic Natural Gradient Variational Inference

Gradient-free variational learning with conditional mixture networks

Partially factorized variational inference for high-dimensional mixed models

Wasserstein Gradient Flow over Variational Parameter Space for Variational Inference

A Novel Scalable Semi-supervised GMM and Its Application for Multimode Process Quality Prediction with Big Data

Noisy Natural Gradient As Variational Inference

VI-DGP: A Variational Inference Method with Deep Generative Prior for Solving High-Dimensional Inverse Problems

Efficient Optimization of Variational Autoregressive Networks with Natural Gradient

Towards Improved Variational Inference for Deep Bayesian Models

Amortized Variational Inference for Deep Gaussian Processes

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Natural Gradient Variational Bayes Without Fisher Matrix Analytic Calculation and Its Inversion

Manifold Gaussian Variational Bayes on the Precision Matrix

A Variational Approach for Modeling High-dimensional Spatial Generalized Linear Mixed Models

Variational Stochastic Gradient Descent for Deep Neural Networks

Neural Operator Variational Inference based on Regularized Stein Discrepancy for Deep Gaussian Processes