(To save your time you are encouraged to look at the cartoon illustrations for a taste 😆)
Working papers
Wenbo Gong, Yingzhen Li and José Miguel Hernández-Lobato. Sliced Kernelized Stein Discrepancy. To appear at ICLR 2021.
Ruqi Zhang, Yingzhen Li, Chris De Sa, Sam Devlin and Cheng Zhang. Meta-Learning for Variational Inference. To appear at AISTATS 2021
Refereed conference papers
Andrew Y. K. Foong*, David R. Burt*, Yingzhen Li and Richard E. Turner. On the Expressiveness of Approximate Inference in Bayesian Neural Networks. Neural Processing Information Systems (NeurIPS), 2020.
Cheng Zhang, Kun Zhang and Yingzhen Li. A Causal View on Robustness of Neural Networks. Neural Processing Information Systems (NeurIPS), 2020.
Maximilian Igl, Kamil Ciosek, Yingzhen Li, Sebastian Tschiatschek, Cheng Zhang, Sam Devlin and Katja Hofmann. Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck. Neural Processing Information Systems (NeurIPS), 2019.
Ehsan Shareghi, Yingzhen Li, Yi Zhu, Roi Reichart and Anna Korhonen. Bayesian Learning for Neural Dependency Parsing. NAACL-HLT 2019.
Chao Ma, Yingzhen Li and José Miguel Hernández-Lobato. Variational Implicit Processes. International Conference on Machine Learning (ICML), 2019. code
Yingzhen Li, John Bradshaw and Yash Sharma. Are Generative Classifiers More Robust to Adversarial Attacks? International Conference on Machine Learning (ICML), 2019. code cartoon

(Generative/Discriminative classifiers, the graphical model defines its nature)

(Fusing generative/discriminative behaviours: combining VGG-features and generative models)
Adversarial machine learning research has become an arm race, so far the adversary side is taking the lead. We conjecture in this paper that feed-forward DNN's vulnerability to adversarial examples is due to its discriminative nature, so instead we build deep generative models and turn them into classifiers using Bayes' rule. Extensive empirical study shows advantages of generative classifiers over discriminative ones, both in terms of robustness and detection to adversarial examples. We also build a generative-discriminative hybrid for natural image classification, which combines the best from both worlds.
Wenbo Gong*, Yingzhen Li* and José Miguel Hernández-Lobato. Meta-Learning for Stochastic Gradient MCMC. International Conference on Learning Representations (ICLR), 2019. code cartoon

(finding the best one from the largest set of provably correct samplers 😆)

(the meta-learned sampler is trained on an isotropic Gaussian distribution)
Recently people are very excited about meta-learning algorithms that are then used to optimise ML models. However naively proposing an arbitrary neural network to do sampling is almost guaranteed to fail. So we really want to meta-learn an MCMC sampler that is provably correct (i.e. the stationary distribution is the target distribution), and we build on top of a seminal work on SG-MCMC to combine the flexibility of neural networks and Hamiltonian Monte Carlo.
Yingzhen Li and Stephan Mandt. Disentangled Sequential Autoencoder. International Conference on Machine Learning (ICML), 2018. sprites data architecture cartoon

(embarrassingly simple 🙈)
While there are quite a few recent papers talking about learning disentangled representation using some new loss function, I thought: why not just go back to the graphical model land, and directly enforce disentanglement in model design? So there you go a minimalistic generative model for disentangling content and dynamics in a sequence. We have speech recognition example in the paper as well, and we also show using stochastic dynamics could potentially achieve better compression.
Part of this project was done as an intern at Disney research.
(Previously titled "A Deep Generative Model for Disentangled Representations of Sequential Data")
Cuong V. Nguyen, Yingzhen Li, Thang D. Bui and Richard E. Turner. Variational Continual Learning. International Conference on Learning Representations (ICLR), 2018. code cartoon
(Well, quite a bit exaggeration actually, we need to make it practical 😆)
"Bayesian Inference is **the natural choice** for online/incremental/continual learning." Conceptually this is nice, but you know deep learning engineers wouldn't bid for it as exact Bayesian inference takes "forever" to run. In this paper we just use variational Bayes to make it tractable and easy to implement for continual learning of deep learning tasks. For both supervised and unsupervised learning.
Yingzhen Li and Richard E. Turner. Gradient Estimators for Implicit Models. International Conference on Learning Representations (ICLR), 2018. code cartoon

(So why not try gradient approximations instead 😎)
One of the main reasons why people now are super excited about GANs is that the generative model is implicitly specified by a neural network warping simple noise vectors. However, applying probabilistic learning methods to these models requires severe approximations, where an overfitting estimation to the optimisation objective can lead to arbitrarily bad models. In this work, we explore an alternative route -- approximating the gradient function -- for training implicit models. We demonstrate the efficacy of our method with some exciting applications: meta-learning for posterior samplers, and improving GAN's sample diversity.
Yingzhen Li and Yarin Gal. Dropout Inference in Bayesian Neural Networks with Alpha-divergences. International Conference on Machine Learning (ICML), 2017. code cartoon
(Code for the core idea in Keras. With other helper functions dropout+BB-alpha can be implemented in less than 30 lines.)

(Bayesian NN could detect adversarial examples (FGS targeted attack))
Show me your loss function, and I will return you an objective function that trains a Bayesian NN with BB-alpha and dropout! We extended the dropout variational inference idea to minimising alpha-divergences, and the proposed re-formulation can be implemented very efficiently with any high-level deep learning framework. You can try any dropout techniques (Bernoulli, Gaussian, Concrete, dropConnect, and even stochastic depth) and select the best alpha for your need. As a side-project we also showed that Bayesian NNs can be used to detect adversarial examples, both in untargeted and targeted attacks.
Yingzhen Li and Richard E. Turner. Rényi Divergence Variational Inference. Neural Processing Information Systems (NIPS), 2016. code cartoon

(credit to Ferenc Huszár, thanks for your fun emoji illustration!)

(mean-field approximation for Bayesian linear regression)
The goal of approximate inference is to get accurate posterior approximations in a fast way. However the definition of "accurate" really depends on what you actually want. You as the user specify the approximation family that suits your constraints, and my job here is to provide you a flexible inference framework that allows you to fine-tune the alpha parameter to get the best for your need. No painstaking derivations again and again!
(Previously titled "Variational Inference with Rényi Divergence")
José Miguel Hernández-Lobato*, Yingzhen Li*, Mark Rowland, Daniel Hernández-Lobato, Thang Bui and Richard E. Turner. Black-box α-divergence Minimization. International Conference on Machine Learning (ICML), 2016. code cartoon

A taste of alpha-divergences. (Original visualisation credit: Tom Minka)

Connections to existing methods.
Thang Bui, Daniel Hernández-Lobato, José Miguel Hernández-Lobato, Yingzhen Li and Richard E. Turner. Deep Gaussian Processes for Regression using Approximate Expectation Propagation. International Conference on Machine Learning (ICML), 2016. code cartoon

(Fig credit: Wikipedia pages & Thang Bui)
Under some conditions, a Gaussian process model can be viewed as a one-hidden-layer neural network with an infinite number of hidden units. So it makes sense to make GPs "deep", and there you go the deep Gaussian processes. However inference for deep GPs is non-trivial unlike regular deep neural networks, so this paper introduces a scalable inference method for them using Gaussian propagation. GPs are state-of-the-art regression models and our contribution make them even so!
Yingzhen Li, Jose Miguel Hernandez-Lobato and Richard E. Turner. Stochastic Expectation Propagation. Neural Processing Information Systems (NIPS), 2015 (spotlight, 4.5%). demo cartoon
Workshop Preprints
Sebastian Lunz, Yingzhen Li, Andrew Fitzgibbon and Nate Kushman. Inverse Graphics GAN: Learning to Generate 3D Shapes from Unstructured 2D Data. NeurIPS 2020 workshop on Differentiable vision, graphics, and physics applied to machine learning (DiffCVGP), 2020.
Chaochao Lu, Richard E. Turner, Yingzhen Li and Nate Kushman. Interpreting Spatially Infinite Generative Models. ICML 2020 Workshop on Human Interpretability in Machine Learning (WHI), 2020.
Andrew Y.K. Foong, Yingzhen Li, José Miguel Hernández-Lobato and Richard E. Turner. "In-Between" Uncertainty for Bayesian Neural Networks. ICML 2019 workshop on Uncertainty & Robustness in Deep Learning (oral)
Yingzhen Li. Approximate Gradient Descent for Training Implicit Generative Models. NIPS 2017 Bayesian Deep Learning workshop. 2017.
Yingzhen Li, Richard E. Turner and Qiang Liu. Approximate Inference with Amortised MCMC. ICML 2017 Workshop on Implicit Generative Models. 2017. cartoon

(Yes, this is the whole idea, no much need to explain in equations.)
Amortised MCMC is an extremely flexible approximate inference framework. It is completely up to you to specify the sample generator, the Markov chain transition kernel, and the update rule for that generator. In the experiments, we specifically defined the sample generating process by warping simple randomness through a deep neural network, and the update rule was a GAN-like method but acting in latent space. I'm very excited about this work and would like to see extensions to discrete distributions as well!
Yingzhen Li and Qiang Liu. Wild Variational Approximations. preprint presented in NIPS Advances in approximate inference, 2016. cartoon
(A zoo of inference engines available at late 2016)
It would be fantastic if we can get the "lion king" (neural network transform) to help beat the "demon dragon" (exact posterior)! In this preliminary work we discussed several potential proposals to train a "wild approximation" in which prediction is tractable but the density of q might be difficult to evaluate. We presented it at the NIPS 2016 approximate inference workshop with preliminary results for toy examples and generative models.
Yingzhen Li and Richard E. Turner. A Unifying Approximate Inference Framework from Variational Free Energy Relaxation. NIPS Advances in approximate inference workshop, 2016
Daniel Hernández-Lobato, José Miguel Hernández-Lobato, Yingzhen Li, Thang Bui and Richard E. Turner. Stochastic Expectation Propagation for Large Scale Gaussian Process Classification. NIPS Advances in approximate inference workshop (contributed talk), 2015
Yingzhen Li and Ye Zhang. Generating ordered list of Recommended Items: a Hybrid Recommender System of Microblog. 2012
Thesis
Approximate Inference: New Visions. PhD in Engineering, University of Cambridge, June 2018.
Compressed Sensing and Related Learning Problems. B.S. in Mathematics, Sun Yat-sen University, May 2013. (Best B.S. Thesis Award). [slides]
Things I've done in my undergrad years