Gaussian mixture model

In class on Wednesday I briefly described a bivariate mixture model and I “live-coded” an R implementation of a Gibbs sampler for performing posterior inference. In this post I’m just going to revisit these same points.

A Gaussian mixture model has a density function that is a weighted sum of Gaussian density functions, with weights that are probabilities summing to one; they can be multivariate Gaussians, but here we will consider univariate random variables. So the density function is

f(y) = w \phi(y; \mu_1, \sigma^2_1) + (1-w) \phi(y; \mu_0, \sigma^2_0),

where \phi(\cdot; m, v) is the Gaussian pdf with mean m and variance v . Each of the individual density functions are referred to as the “components” of the mixture.

One can (should?) think of the random variable Y being drawn compositionally, first drawing a (“latent”) random variable \gamma and then drawing Y from a Gaussian distribution with parameters (\mu_{\gamma}, \sigma^2_{\gamma}). That is

Y = \gamma Z_1 + (1-\gamma) Z_0

where \gamma \sim \mbox{Bernoulli}(w), Z_1 \sim \mbox{N}(\mu_1, \sigma^2_1), and Z_0 \sim \mbox{N}(\mu_0, \sigma^2_0).

Note: beware that this is not the same as w Z_1 + (1-w) Z_0, which is normally distributed, while Y = \gamma Z_1 + (1-\gamma) Z_0 has density f(y) above.

To perform Bayesian inference on the five parameters in the model, (w, \mu_1, \mu_0, \sigma^2_1, \sigma^2_0), we must specify priors and then compute the posterior distribution. For priors we will use conditionally conjugate specifications, allowing us to use things we’ve already learned about normal models, inverse-gamma priors, beta-binomial models, etc.

The posterior distribution will not be in closed form, so we will resort (gladly) to posterior sampling using a Gibbs sampler. Recall that a Gibbs sampler cycles through each parameter, one at a time, and draws that parameter from the “full conditional” — the posterior distribution of that parameters given all the other parameters (fixed at value of their most recent draw/iteration from the Markov chain).

The key to this strategy will be to actually sample the latent indicator variable \gamma . With \gamma vector in hand all of the other updates are from standard exponential family models with known closed forms (available in convenient form on my favorite web site). Similarly, the update(s) for \gamma arise as a straightforward application of Bayes rule. The probability that observation $Y_i$ arose from component number one (with parameters (\mu_1, \sigma^2_1)) is just:

\mbox{Pr}(\gamma_i \mid Y_{1:n}, \mu_1, \mu_0, \sigma^2_1, \sigma^2_0, w) = \frac{w \phi(y_i \mid \mu_1, \sigma^2_1)}{w \phi(y_i \mid \mu_1, \sigma^2_1) + (1-w)\phi(y_i \mid \mu_0, \sigma^2_0)}.

Observe that in this expression Y_i is conditionally independent of all Y_{-i} given the component parameters.

See this R script for a simple implementation of the Gibbs sampler, from which you can find the finer details. (As always, let me know by email if you spot typos.)

Leave a comment