Stochastic search variable selection

The topic of today’s post is Bayesian “variable selection” using point-mass mixture priors. This builds of off the previous post concretely, adapting the ideas to the linear regression setting.

The key reference for this approach to variable selection is George and McCulloch; see also the literature review of Hahn and Carvalho.

The model is simply the homoskedastic, Gaussian linear regression model:

Y_i = X_i^t\beta + \epsilon_i

where \epsilon_i \sim \mbox{N}(0,\sigma^2) and \beta is a p dimensional column vector of regression coefficients.

To build off the previous post, here we assume that the prior for each \beta_j has density

\pi(\beta_j) = q \phi(\beta_j \mid 0, v) + (1-q)\delta_0(\beta_j),

where \phi(\cdot \mid, m, v) is a Gaussian density function with mean m and variance v. The Gibbs sampler then follows the same steps described in the previous post, except the conjugate portion of the model is based on the conjugate regression update using the “residualized” regression model defined based on

Y_i - X_{i,-j}^t\beta_{-j} = X_{i,j}\beta_j + \epsilon_i.

An R script implementing this approach is here. Note that we also place a uniform prior over the fraction of non-zero components, q.

Leave a comment