Final projects

1) Install and load the bayesm package in R. Load the Scotch data using the command data(Scotch). This data consists of Yes/No survey answers from 2,218 individuals reporting which of 20 Scotch whiskey brands (and one “Other” category) they have bought in the past year.

To analyze this data, first discard the variable corresponding to “Other”. Then, fit a Gaussian factor model to the Scotch data using a probit link:

\mbox{Pr}(Y_{ji} = 1 \mid \mu_j, B_j, f_i) = \Phi(\mu_j + B_jf_i)

where f_i \sim N(0,I) is a latent (unobserved) random vector of length k. Assume a lower triangular structure of the loadings matrix \mathbf{B}  (so that the upper right corner entries are all zero). Use conditionally conjugate priors. By analyzing the columns of \mathbf{B} and some Google searches, propose measurable features of the various Scotch brands that might account for the observed correlations in the Scotch survey data. Repeat this process for several different choices of k.

2) Append the Scotch data with a vector denoting which brands posses which of your proposed features. Refit the model, now including these covariates, denoted here as a  length d vector X_j. Now the model becomes

\mbox{Pr}(Y_{ji} = 1 \mid \mu_j, B_j, f_i, X_j, \beta) = \Phi(\mu_j + X_j^t\beta + B_jf_i).

Use k - d factors this time. Use independent point-mass variable selection priors for the elements of \beta:

\pi(\beta_j) = q \phi(\beta_j \mid 0, v) + (1-q)\delta_0

and give q an independent uniform prior. Examine posterior inclusion probabilities and the posterior of \mathbf{B} to argue whether or not your observed factors have adequately described the covariation in the data.

Leave a comment