What is approximate inference? more

Why I should care about it? more

Now you might ask: why I should care about this approximate integral in the first place? Couldn't we just forget about integrals and do point estimations instead? Or couldn't we just use models that can do exact inference (e.g. stacking invertible transformations, auto-regressive models, sum-product networks)? Well for the first question it's more like a debate between Frequentist and Bayesian which I don't think I should say too much about it here. But I shall emphasize that *integration is almost everywhere in probability and statistics*, say calculating expectations and computing marginal distributions. For the second question, those models are usually more computationally demanding (both in time and memory), which is exactly the point for not using them and do approximate inference instead if you don't have that much computational resource. **Approximate inference only makes sense when you do have computational constraints, and the way you do approximate inference also depends on how much precision you want and computational resource you have.**

What this list is about? more

There are mainly two ways to do approximate inference: directly approximating the integral you want, or, finding an accurate approximation to the target distribution and using it for integration later. The first approach is mainly dominated by sampling methods (and maybe adding student models to distill it) and quadrature. The second one, sometimes referred as the indirect approach, will be the focus of this list.

(Posts on specific topics coming ðŸ™‚ )

Front matter

Theme I: algorithms for fitting approximate distributions:

- variational inference (VI);
- Monte Carlo VI and gradient estimators;
- lower-bounds and upper-bounds;
- message passing, belief/expectation propagation;
- optimisation meets VI (proximal gradient, trust-region method, etc);
- wild approximate inference (with implicit q distributions).

Theme II: approximate distribution design:

- invertible transform;
- non-parametric approximations (e.g. empirical distributions in sampling);
- latent variable models as approximate distributions;
- perturbations/corrections after fitting a simple approximation;
- bridging VI and MCMC.

Theme III: applications:

- learning generative models/latent variable models;
- Bayesian neural networks;
- Gaussian processes;
- probabilistic graphical models.