(To save your time you are encouraged to look at the cartoon illustrations for a taste 😆)
"Bayesian Inference is **the natural choice** for online/incremental/continual learning." Conceptually this is nice, but you know deep learning engineers wouldn't bid for it as exact Bayesian inference takes "forever" to run. In this paper we just use variational Bayes to make it tractable and easy to implement for continual learning of deep learning tasks. For both supervised and unsupervised learning.
(So why not try gradient approximations instead 😎)
One of the main reasons why people now are super excited about GANs is that the generative model is implicitly specified by a neural network warping simple noise vectors. However, applying probabilistic learning methods to these models requires severe approximations, where an overfitting estimation to the optimisation objective can lead to arbitrarily bad models. In this work, we explore an alternative route -- approximating the gradient function -- for training implicit models. We demonstrate the efficacy of our method with some exciting applications: meta-learning for posterior samplers, and improving GAN's sample diversity.
(Yes, this is the whole idea, no much need to explain in equations.)
Amortised MCMC is an extremely flexible approximate inference framework. It is completely up to you to specify the sample generator, the Markov chain transition kernel, and the update rule for that generator. In the experiments, we specifically defined the sample generating process by warping simple randomness through a deep neural network, and the update rule was a GAN-like method but acting in latent space. I'm very excited about this work and would like to see extensions to discrete distributions as well!
It would be fantastic if we can get the "lion king" (neural network transform) to help beat the "demon dragon" (exact posterior)! In this preliminary work we discussed several potential proposals to train a "wild approximation" in which prediction is tractable but the density of q might be difficult to evaluate. We presented it at the NIPS 2016 approximate inference workshop with preliminary results for toy examples and generative models.
Refereed conference papers
Yingzhen Li and Yarin Gal. Dropout Inference in Bayesian Neural Networks with Alpha-divergences. International Conference on Machine Learning (ICML), 2017. code cartoon
(Code for the core idea in Keras. With other helper functions dropout+BB-alpha can be implemented in less than 30 lines.)
(Bayesian NN could detect adversarial examples (FGS targeted attack))
Show me your loss function, and I will return you an objective function that trains a Bayesian NN with BB-alpha and dropout! We extended the dropout variational inference idea to minimising alpha-divergences, and the proposed re-formulation can be implemented very efficiently with any high-level deep learning framework. You can try any dropout techniques (Bernoulli, Gaussian, Concrete, dropConnect, and even stochastic depth) and select the best alpha for your need. As a side-project we also showed that Bayesian NNs can be used to detect adversarial examples, both in untargeted and targeted attacks.
Yingzhen Li and Richard E. Turner. Rényi Divergence Variational Inference. Neural Processing Information Systems (NIPS), 2016. (Previously titled "Variational Inference with Rényi Divergence") code cartoon
(credit to Ferenc Huszár, thanks for your fun emoji illustration!)
(mean-field approximation for Bayesian linear regression)
The goal of approximate inference is to get accurate posterior approximations in a fast way. However the definition of "accurate" really depends on what you actually want. You as the user specify the approximation family that suits your constraints, and my job here is to provide you a flexible inference framework that allows you to fine-tune the alpha parameter to get the best for your need. No painstaking derivations again and again!
José Miguel Hernández-Lobato*, Yingzhen Li*, Mark Rowland, Daniel Hernández-Lobato, Thang Bui and Richard E. Turner. Black-box α-divergence Minimization. International Conference on Machine Learning (ICML), 2016. code cartoon
A taste of alpha-divergences. (Original visualisation credit: Tom Minka)
Connections to existing methods.
Thang Bui, Daniel Hernández-Lobato, José Miguel Hernández-Lobato, Yingzhen Li and Richard E. Turner. Deep Gaussian Processes for Regression using Approximate Expectation Propagation. International Conference on Machine Learning (ICML), 2016. code cartoon
(Fig credit: Wikipedia pages & Thang Bui)
Under some conditions, a Gaussian process model can be viewed as a one-hidden-layer neural network with an infinite number of hidden units. So it makes sense to make GPs "deep", and there you go the deep Gaussian processes. However inference for deep GPs is non-trivial unlike regular deep neural networks, so this paper introduces a scalable inference method for them using Gaussian propagation. GPs are state-of-the-art regression models and our contribution make them even so!
Yingzhen Li and Richard E. Turner. A Unifying Approximate Inference Framework from Variational Free Energy Relaxation. NIPS Advances in approximate inference workshop, 2016
Daniel Hernández-Lobato, José Miguel Hernández-Lobato, Yingzhen Li, Thang Bui and Richard E. Turner. Stochastic Expectation Propagation for Large Scale Gaussian Process Classification. NIPS Advances in approximate inference workshop (contributed talk), 2015
Yingzhen Li and Ye Zhang. Generating ordered list of Recommended Items: a Hybrid Recommender System of Microblog. 2012