Jim — Dec 1, 2013, 10:29 PM
##################################################################
# Bayesian probit regression
# (Illustration of Albert-Chib algorithm)
##################################################################
# To illustrate Bayesian fitting of a probit regression model,
# consider data on 30 college students. For the ith student, we
# collect her SAT score SATi and an indicator yi (1 or 0) if she
# passed a statistics class. If pi = P(yi = 1) denotes
# the probability the ith student passes the class, the probit model
# is represented as
# F^{â1}(pi) = beta0 + beta1 SATi,
# where F^{â1}(pi) is the inverse cdf of the standard normal.
# Suppose we place a uniform prior on the regression vector
# (beta0, beta1)
# We first place the grades and sat scores for the 30 students in
# the vectors grade and sat.
grade <- c(0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1,
0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1)
sat <- c(525, 533, 545, 582, 581, 576, 572, 609, 559, 543, 576,
525, 574, 582, 574, 471, 595, 557, 557, 584, 599, 517, 649,
584, 463, 591, 488, 563, 553, 549)
# We perform the usual mle fit of this model by the glm command.
fit1 <- glm(grade ~ sat, family = binomial(link = "probit"))
summary(fit1)
Call:
glm(formula = grade ~ sat, family = binomial(link = "probit"))
Deviance Residuals:
Min 1Q Median 3Q Max
-2.298 -0.147 0.360 0.518 1.487
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -17.9611 6.6260 -2.71 0.0067 **
sat 0.0334 0.0119 2.79 0.0052 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 36.652 on 29 degrees of freedom
Residual deviance: 22.233 on 28 degrees of freedom
AIC: 26.23
Number of Fisher Scoring iterations: 6
# One can simulate from the posterior distribution by means of the data
# augmentation/Gibbs sampling algorithm of Albert and Chib (1993).
# This fitting is done by the function bayes.probit in the
# LearnBayes package.
# The inputs are the vector of binary responses, the covariate matrix,
# and the number of simulations required. The output is a list:
# component beta is a matrix, where
# each row corresponds to a single draw of the regression
# vector (beta0, beta1).
library(LearnBayes)
sim.par <- bayes.probit(grade, cbind(1, sat), 10000)
# We can summarize the simulated draws by the computation of the 5th,
# 50th, 95th percentiles. The 50th percentile is a point estimate and
# the 5th and 95th percentiles form a 90% interval estimate for betai.
apply(sim.par$beta, 2, quantile, c(0.05, 0.5, 0.95))
[,1] [,2]
5% -31.781 0.01903
50% -19.549 0.03631
95% -9.991 0.05838
# The following command constructs a density estimate of the simulated
# draws of the slope parameter beta1. Note that most of the mass is on
# positive values, indicating that there is a high probability that beta1
# is positive and there is an increasing relationship between SAT score
# and the probability of passing the class.
plot(density(sim.par$beta[, 2]), xlab="beta1")
# To look further at the relationship between SAT score and course
# success, we can focus on the probability of passing
# p = F(beta0 + beta1 SAT) for specific SAT scores.
# We can obtain simulated draws from the posterior of the fitted
# probability using the function bprobit.probs.
# First, we define three covariate vectors of interest corresponding
# to SAT scores 500, 530, and 560. We stack the
# vectors in the matrix covariates.
cov1 <- c(1, 500)
cov2 <- c(1, 530)
cov3 <- c(1, 560)
covariates <- rbind(cov1, cov2, cov3)
# We then use the function bprobit.probs; the two inputs are the
# matrix of covariate and the matrix of simulated draws from
# bayes.probit. The output is a matrix of simulated draws fitted.probs,
# where each column corresponds to a particular covariate vector.
fitted.probs <- bprobit.probs(covariates, sim.par$beta)
# We summarize each posterior density of p = F(beta0 + beta1 SAT) by a
# density estimate. We place all three density estimates on the same
# graph; this clearly shows how the passing probability increases as
# the SAT score increases.
plot(density(fitted.probs[, 1]), xlim = c(0, 1),
lwd = 2, xlab = "Probability of passing")
lines(density(fitted.probs[, 2]), lwd = 2)
lines(density(fitted.probs[, 3]), lwd = 2)
text(0.18, 4, "SAT=500")
text(0.38, 3, "SAT=530")
text(0.8, 5, "SAT=560")