(To save your time you are encouraged to look at the cartoon illustrations for a taste 😆)
Maximilian Igl, Kamil Ciosek, Yingzhen Li, Sebastian Tschiatschek, Cheng Zhang, Sam Devlin and Katja Hofmann. Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck. Accepted at NeurIPS 2019.
Yingzhen Li. Approximate Gradient Descent for Training Implicit Generative Models. NIPS 2017 Bayesian Deep Learning workshop. 2017.
(Yes, this is the whole idea, no much need to explain in equations.)
Amortised MCMC is an extremely flexible approximate inference framework. It is completely up to you to specify the sample generator, the Markov chain transition kernel, and the update rule for that generator. In the experiments, we specifically defined the sample generating process by warping simple randomness through a deep neural network, and the update rule was a GAN-like method but acting in latent space. I'm very excited about this work and would like to see extensions to discrete distributions as well!
Refereed conference papers
Yingzhen Li, John Bradshaw and Yash Sharma. Are Generative Classifiers More Robust to Adversarial Attacks? International Conference on Machine Learning (ICML), 2019. code
(finding the best one from the largest set of provably correct samplers 😆)
(the meta-learned sampler is trained on an isotropic Gaussian distribution)
Recently people are very excited about meta-learning algorithms that are then used to optimise ML models. However naively proposing an arbitrary neural network to do sampling is almost guaranteed to fail. So we really want to meta-learn an MCMC sampler that is provably correct (i.e. the stationary distribution is the target distribution), and we build on top of a seminal work on SG-MCMC to combine the flexibility of neural networks and Hamiltonian Monte Carlo.
(embarrassingly simple 🙈)
While there are quite a few recent papers talking about learning disentangled representation using some new loss function, I thought: why not just go back to the graphical model land, and directly enforce disentanglement in model design? So there you go a minimalistic generative model for disentangling content and dynamics in a sequence. We have speech recognition example in the paper as well, and we also show using stochastic dynamics could potentially achieve better compression.
Part of this project was done as an intern at Disney research.
(Previously titled "A Deep Generative Model for Disentangled Representations of Sequential Data")
"Bayesian Inference is **the natural choice** for online/incremental/continual learning." Conceptually this is nice, but you know deep learning engineers wouldn't bid for it as exact Bayesian inference takes "forever" to run. In this paper we just use variational Bayes to make it tractable and easy to implement for continual learning of deep learning tasks. For both supervised and unsupervised learning.
(So why not try gradient approximations instead 😎)
One of the main reasons why people now are super excited about GANs is that the generative model is implicitly specified by a neural network warping simple noise vectors. However, applying probabilistic learning methods to these models requires severe approximations, where an overfitting estimation to the optimisation objective can lead to arbitrarily bad models. In this work, we explore an alternative route -- approximating the gradient function -- for training implicit models. We demonstrate the efficacy of our method with some exciting applications: meta-learning for posterior samplers, and improving GAN's sample diversity.
Yingzhen Li and Yarin Gal. Dropout Inference in Bayesian Neural Networks with Alpha-divergences. International Conference on Machine Learning (ICML), 2017. code cartoon
(Code for the core idea in Keras. With other helper functions dropout+BB-alpha can be implemented in less than 30 lines.)
(Bayesian NN could detect adversarial examples (FGS targeted attack))
Show me your loss function, and I will return you an objective function that trains a Bayesian NN with BB-alpha and dropout! We extended the dropout variational inference idea to minimising alpha-divergences, and the proposed re-formulation can be implemented very efficiently with any high-level deep learning framework. You can try any dropout techniques (Bernoulli, Gaussian, Concrete, dropConnect, and even stochastic depth) and select the best alpha for your need. As a side-project we also showed that Bayesian NNs can be used to detect adversarial examples, both in untargeted and targeted attacks.
(credit to Ferenc Huszár, thanks for your fun emoji illustration!)
(mean-field approximation for Bayesian linear regression)
The goal of approximate inference is to get accurate posterior approximations in a fast way. However the definition of "accurate" really depends on what you actually want. You as the user specify the approximation family that suits your constraints, and my job here is to provide you a flexible inference framework that allows you to fine-tune the alpha parameter to get the best for your need. No painstaking derivations again and again!
(Previously titled "Variational Inference with Rényi Divergence")
José Miguel Hernández-Lobato*, Yingzhen Li*, Mark Rowland, Daniel Hernández-Lobato, Thang Bui and Richard E. Turner. Black-box α-divergence Minimization. International Conference on Machine Learning (ICML), 2016. code cartoon
A taste of alpha-divergences. (Original visualisation credit: Tom Minka)
Connections to existing methods.
Thang Bui, Daniel Hernández-Lobato, José Miguel Hernández-Lobato, Yingzhen Li and Richard E. Turner. Deep Gaussian Processes for Regression using Approximate Expectation Propagation. International Conference on Machine Learning (ICML), 2016. code cartoon
(Fig credit: Wikipedia pages & Thang Bui)
Under some conditions, a Gaussian process model can be viewed as a one-hidden-layer neural network with an infinite number of hidden units. So it makes sense to make GPs "deep", and there you go the deep Gaussian processes. However inference for deep GPs is non-trivial unlike regular deep neural networks, so this paper introduces a scalable inference method for them using Gaussian propagation. GPs are state-of-the-art regression models and our contribution make them even so!
Andrew Y.K. Foong, Yingzhen Li, José Miguel Hernández-Lobato and Richard E. Turner. "In-Between" Uncertainty for Bayesian Neural Networks. ICML 2019 workshop on Uncertainty & Robustness in Deep Learning (oral)
It would be fantastic if we can get the "lion king" (neural network transform) to help beat the "demon dragon" (exact posterior)! In this preliminary work we discussed several potential proposals to train a "wild approximation" in which prediction is tractable but the density of q might be difficult to evaluate. We presented it at the NIPS 2016 approximate inference workshop with preliminary results for toy examples and generative models.
Yingzhen Li and Richard E. Turner. A Unifying Approximate Inference Framework from Variational Free Energy Relaxation. NIPS Advances in approximate inference workshop, 2016
Daniel Hernández-Lobato, José Miguel Hernández-Lobato, Yingzhen Li, Thang Bui and Richard E. Turner. Stochastic Expectation Propagation for Large Scale Gaussian Process Classification. NIPS Advances in approximate inference workshop (contributed talk), 2015
Yingzhen Li and Ye Zhang. Generating ordered list of Recommended Items: a Hybrid Recommender System of Microblog. 2012
Approximate Inference: New Visions. PhD in Engineering, University of Cambridge, June 2018.