(To save your time you are encouraged to look at the cartoon illustrations for a taste 😆)

**Working papers**

Cuong V. Nguyen, **Yingzhen Li**, Thang D. Bui and Richard E. Turner. Variational Continual Learning. Submitted. cartoon

(Well, quite a bit exaggeration actually, we need to make it practical 😆)

"Bayesian Inference is **the natural choice** for online/incremental/continual learning." Conceptually this is nice, but you know deep learning engineers wouldn't bid for it as exact Bayesian inference takes "forever" to run. In this paper we just use variational Bayes to make it tractable and easy to implement for continual learning of deep learning tasks. For both supervised and unsupervised learning.

**Yingzhen Li** and Richard E. Turner. Gradient Estimators for Implicit Models. Submitted. cartoon

(So why not try gradient approximations instead 😎)

One of the main reasons why people now are super excited about GANs is that the generative model is implicitly specified by a neural network warping simple noise vectors. However, applying probabilistic learning methods to these models requires severe approximations, where an overfitting estimation to the optimisation objective can lead to arbitrarily bad models. In this work, we explore an alternative route -- approximating the gradient function -- for training implicit models. We demonstrate the efficacy of our method with some exciting applications: meta-learning for posterior samplers, and improving GAN's sample diversity.

**Yingzhen Li**, Richard E. Turner and Qiang Liu. Approximate Inference with Amortised MCMC. Submitted. cartoon

(Yes, this is the whole idea, no much need to explain in equations.)

Amortised MCMC is an extremely flexible approximate inference framework. It is completely up to you to specify the sample generator, the Markov chain transition kernel, and the update rule for that generator. In the experiments, we specifically defined the sample generating process by warping simple randomness through a deep neural network, and the update rule was a GAN-like method but acting in latent space. I'm very excited about this work and would like to see extensions to discrete distributions as well!

**Yingzhen Li** and Qiang Liu. Wild Variational Approximations. preprint presented in NIPS Advances in approximate inference, 2016. cartoon

(A zoo of inference engines available at late 2016)

It would be fantastic if we can get the "lion king" (neural network transform) to help beat the "demon dragon" (exact posterior)! In this preliminary work we discussed several potential proposals to train a "wild approximation" in which prediction is tractable but the density of q might be difficult to evaluate. We presented it at the NIPS 2016 approximate inference workshop with preliminary results for toy examples and generative models.

**Refereed conference papers**

**Yingzhen Li** and Yarin Gal. Dropout Inference in Bayesian Neural Networks with Alpha-divergences. International Conference on Machine Learning (ICML), 2017. code cartoon

(Code for the core idea in Keras. With other helper functions dropout+BB-alpha can be implemented in less than 30 lines.)

(Bayesian NN could detect adversarial examples (FGS targeted attack))

Show me your loss function, and I will return you an objective function that trains a Bayesian NN with BB-alpha and dropout! We extended the dropout variational inference idea to minimising alpha-divergences, and the proposed re-formulation can be implemented very efficiently with any high-level deep learning framework. You can try any dropout techniques (Bernoulli, Gaussian, Concrete, dropConnect, and even stochastic depth) and select the best alpha for your need. As a side-project we also showed that Bayesian NNs can be used to detect adversarial examples, both in untargeted and targeted attacks.

**Yingzhen Li** and Richard E. Turner. Rényi Divergence Variational Inference. Neural Processing Information Systems (NIPS), 2016. (Previously titled "Variational Inference with Rényi Divergence") code cartoon

(credit to Ferenc Huszár, thanks for your fun emoji illustration!)

(mean-field approximation for Bayesian linear regression)

The goal of approximate inference is to get accurate posterior approximations in a fast way. However the definition of "accurate" really depends on what you actually want. You as the user specify the approximation family that suits your constraints, and **my job here is to provide you a flexible inference framework that allows you to fine-tune the alpha parameter to get the best for your need.** No painstaking derivations again and again!

José Miguel Hernández-Lobato*, **Yingzhen Li***, Mark Rowland, Daniel Hernández-Lobato, Thang Bui and Richard E. Turner. Black-box α-divergence Minimization. International Conference on Machine Learning (ICML), 2016. code cartoon

A taste of alpha-divergences. (Original visualisation credit: Tom Minka)

Connections to existing methods.

Thang Bui, Daniel Hernández-Lobato, José Miguel Hernández-Lobato, **Yingzhen Li** and Richard E. Turner. Deep Gaussian Processes for Regression using Approximate Expectation Propagation. International Conference on Machine Learning (ICML), 2016. code cartoon

(Fig credit: Wikipedia pages & Thang Bui)

Under some conditions, **a Gaussian process model can be viewed as a one-hidden-layer neural network with an infinite number of hidden units**. So it makes sense to make GPs "deep", and there you go the deep Gaussian processes. However inference for deep GPs is non-trivial unlike regular deep neural networks, so this paper introduces **a scalable inference method** for them using Gaussian propagation. GPs are state-of-the-art regression models and our contribution make them even so!

**Yingzhen Li**, Jose Miguel Hernandez-Lobato and Richard E. Turner. Stochastic Expectation Propagation*.* Neural Processing Information Systems (NIPS), 2015 (**spotlight, 4.5%**). demo cartoon

**Workshop Preprints**

**Yingzhen Li** and Richard E. Turner. A Unifying Approximate Inference Framework from Variational Free Energy Relaxation. NIPS Advances in approximate inference workshop, 2016

Daniel Hernández-Lobato, José Miguel Hernández-Lobato, **Yingzhen Li**, Thang Bui and Richard E. Turner. Stochastic Expectation Propagation for Large Scale Gaussian Process Classification. NIPS Advances in approximate inference workshop (contributed talk), 2015

**Yingzhen Li** and Ye Zhang. Generating ordered list of Recommended Items: a Hybrid Recommender System of Microblog. 2012

**Thesis**

*Compressed Sensing and Related Learning Problems.* *B.S. in Mathematics, Sun Yat-sen University, May 2013.* (Best B.S. Thesis Award). [slides]

Things I've done in my undergrad years