The ProbBayes
package was written to accompany the text Probability and Bayesian Modeling by Jim Albert and Jingchen Hu. The package includes a number of functions to illustrate Bayesian computations described in the text. In addition, ProbBayes
contains all datasets used in the examples and exercises. The purpose of this document to provide an overview of the functions contained in the package.
The ProbBayes
package is available on CRAN. The package is installed by use of the install.packages()
function.
install.packages("ProbBayes")
To load the package, use the library()
function.
library(ProbBayes)
(Note: The ProbBayes
package depends on the LearnBayes
, ggplot2
, gridExtra
, and shiny
packages. It may be necessary to also install these prerequisite packages for ProbBayes
to correctly install.)
All datsets mentioned in Probability and Bayesian Modeling are contained in the ProbBayes
package. For example, the dataset pt99price.csv
contains the point prices of 0.99 carat diamonds in Exercise 16 of Chapter 8. This data is contained in the data framew pt99price`` in
ProbBayes```.
head(pt99price)
## diamond price
## 1 1 28.39394
## 2 2 28.40404
## 3 3 29.78788
## 4 4 33.70707
## 5 5 36.29293
## 6 6 40.42424
This chapter introduces Bayes’ rule by use of a spinner example. Suppose there are four possible spinners defined by four vectors. The length of a vector corresponds to the number of areas and the values are proportional to the areas.
s1 <- c(2, 2, 2, 2)
s2 <- c(4, 1, 1, 2)
s3 <- c(2, 4, 2)
s4 <- c(1, 3, 3, 1)
We can see these spinners by use of the many_spinner_plots()
function.
many_spinner_plots(list(s1, s2, s3, s4))
Initially, suppose you have little knowledge about the actual spinner and so you place a uniform prior on these spinners. We construct a data frame with the four models and the associated prior probabilities.
(bayes_table <- data.frame(Model =
paste("Spinner", c("A", "B", "C", "D")),
Prior = c(1, 1, 1, 1) / 4))
## Model Prior
## 1 Spinner A 0.25
## 2 Spinner B 0.25
## 3 Spinner C 0.25
## 4 Spinner D 0.25
Suppose a spinner is chosen at random and the outcome is “1”. The likelihood is the chance of observing “1” for each spinner.
library(dplyr)
(bayes_table %>%
mutate(Likelihood = c(1/4, 1/2, 1/4, 1/8)) ->
bayes_table)
## Model Prior Likelihood
## 1 Spinner A 0.25 0.250
## 2 Spinner B 0.25 0.500
## 3 Spinner C 0.25 0.250
## 4 Spinner D 0.25 0.125
The bayesian_crank()
function will update the probabilities given that the spinner landed “1”.
(bayes_table <- bayesian_crank(bayes_table))
## Model Prior Likelihood Product Posterior
## 1 Spinner A 0.25 0.250 0.06250 0.2222222
## 2 Spinner B 0.25 0.500 0.12500 0.4444444
## 3 Spinner C 0.25 0.250 0.06250 0.2222222
## 4 Spinner D 0.25 0.125 0.03125 0.1111111
The prior_post_plot()
plots the prior and posterior probabilities of the spinners. We see that it is most likely that the unknown spinner is Spinner B.
prior_post_plot(bayes_table)
We introduce Bayesian thinking about a proportion using a discrete prior. Suppose you are interested in estimating the proportion \(p\) of coffee drinkers on campus and your prior on \(p\) is uniform on the values \(p\) = 0.1, 0.2, …, 0.9. You take a sample of 20 students and you find 12 coffee drinkers. What have you learned about \(p\)?
Construct a data frame with your prior.
(df <- data.frame(p = seq(0.1, 0.9, by = 0.1),
Prior = rep(1/9, 9)))
## p Prior
## 1 0.1 0.1111111
## 2 0.2 0.1111111
## 3 0.3 0.1111111
## 4 0.4 0.1111111
## 5 0.5 0.1111111
## 6 0.6 0.1111111
## 7 0.7 0.1111111
## 8 0.8 0.1111111
## 9 0.9 0.1111111
The likelihood is the binomial probability of 12 successes out of 20 when the probability of success is \(p\).
(df %>%
mutate(Likelihood = dbinom(12, size = 20, prob = p)) -> df)
## p Prior Likelihood
## 1 0.1 0.1111111 5.422595e-08
## 2 0.2 0.1111111 8.656592e-05
## 3 0.3 0.1111111 3.859282e-03
## 4 0.4 0.1111111 3.549744e-02
## 5 0.5 0.1111111 1.201344e-01
## 6 0.6 0.1111111 1.797058e-01
## 7 0.7 0.1111111 1.143967e-01
## 8 0.8 0.1111111 2.216088e-02
## 9 0.9 0.1111111 3.557765e-04
Update the probabilities using the bayesian_crank()
function.
(df <- bayesian_crank(df))
## p Prior Likelihood Product Posterior
## 1 0.1 0.1111111 5.422595e-08 6.025106e-09 1.138730e-07
## 2 0.2 0.1111111 8.656592e-05 9.618436e-06 1.817860e-04
## 3 0.3 0.1111111 3.859282e-03 4.288091e-04 8.104383e-03
## 4 0.4 0.1111111 3.549744e-02 3.944160e-03 7.454362e-02
## 5 0.5 0.1111111 1.201344e-01 1.334826e-02 2.522788e-01
## 6 0.6 0.1111111 1.797058e-01 1.996731e-02 3.773771e-01
## 7 0.7 0.1111111 1.143967e-01 1.271075e-02 2.402299e-01
## 8 0.8 0.1111111 2.216088e-02 2.462320e-03 4.653722e-02
## 9 0.9 0.1111111 3.557765e-04 3.953072e-05 7.471206e-04
Show the prior and posterior for \(p\). We see that it is very likely that \(p\) is 0.5, 0.6, or 0.7.
prior_post_plot(df)
Using the same example, suppose you want to use a beta prior to model your beliefs about \(p\), the proportion of coffee drinkers on campus. Your median for \(p\) is 0.40 and you are 90% sure that \(p\) is smaller than 0.55. The beta.select()
function will find the shape parameters of the beta prior that matches these statements.
(ab <- beta.select(list(p = 0.5, x = 0.4),
list(p = 0.9, x = 0.55)))
## [1] 7.53 11.13
(You can graphically find the beta parameters to match statements of the 50th and 90th percentiles by use of a Shiny app accessed by the ChooseBeta()
function.)
We observed 12 coffee drinkers and 8 non-coffee drinkers. We update the beta parameters of the beta posterior.
(ab_new <- ab + c(12, 8))
## [1] 19.53 19.13
The beta_prior_post()
function shows the prior and posterior beta functions.
beta_prior_post(ab, ab_new)
There are several functions beta_area()
, beta_interval()
, beta_quantile()
to summarize the beta posterior.
To illustrate, we use the beta_interval()
to construct a 90% interval for \(p\).
beta_interval(0.90, ab_new)
Suppose a dietician wishes to learn about the mean weight gain \(\mu\) of freshmen college students in the first month of college. His best guess apriori at \(\mu\) is 3 lbs and he is 90% sure that \(\mu\) is smaller than 5 lbs.
We find a matching normal prior by use of the normal.select()
function.
(normal_prior <- normal.select(list(p = .5, x = 3),
list(p = .9, x = 5)))
## $mu
## [1] 3
##
## $sigma
## [1] 1.560608
We see that a N(3, 1.56) prior matches his beliefs.
Suppose a sample of students gives \(\bar y = 2\) with a standard error of \(se = 0.5\). One can find the parameters of the normal posterior by the normal_update()
function.
prior <- c(3, 1.56)
data <- c(2, 0.5)
normal_update(prior, data, teach = TRUE)
## Type Mean Precision Stand_Dev
## 1 Prior 3.000000 0.4109139 1.5600000
## 2 Data 2.000000 4.0000000 0.5000000
## 3 Posterior 2.093158 4.4109139 0.4761411
We see the posterior for \(\mu\) is N(2.09, 0.48).
The functions normal_area()
, normal_interval()
and normal_quantile()
are helpful for computing various summaries of the normal posterior.
What is the posterior probability that \(\mu\) is between 1 and 3 lbs?
normal_area(1, 3, c(2.09, 0.48))
There are several functions random_walk()
, metropolis()
, gibbs_discrete()
, gibbs_normal()
, and gibbs_betabin()
that are discussed in the text.
We illustrate the use of metropolis()
to simulate from a real-valued posterior of a single parameter.
In the Cauchy-Normal problem discussed in Section 9.4, suppose \(\mu\) is the mean monthly snowfall and we assign \(\mu\) a Cauchy prior with location 10 and scale 2. We observe some data – the sample mean is \(\bar y = 26.78\) inches with corresponding standard error of 3.24 inches.
I program the log posterior:
cn_logpost <- function(mu, loc, scale, ybar, se){
dcauchy(mu, loc, scale, log = TRUE) +
dnorm(ybar, mu, se, log = TRUE)
}
I use metropolis sampling where (1) the starting value is 10, (2) the width of the proposal density is 20, and I take 5000 interations.
post <- metropolis(cn_logpost, 10, 20, 5000,
10, 2, 26.78, 3.24)
We check the acceptance rate of the algorithm.
post$accept_rate
## [1] 0.2688
By using the coda
package, we display a trace plot and density plot of the simulated draws.
library(coda)
plot(mcmc(post$S))
We use the coda
package to get posterior summaries.
summary(mcmc(post$S))
##
## Iterations = 1:5000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 5000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## 25.44596 3.40757 0.04819 0.10744
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## 18.79 23.19 25.51 27.85 31.79
Section 10.3 illustrates the use of hierarchical model where
We illustrate fitting this model on the famous Efron and Morris’ baseball dataset stored in the pscl
package.
library(pscl)
(select(EfronMorris, name, r) %>%
mutate(n = 45) -> d)
## name r n
## 1 Roberto Clemente 18 45
## 2 Frank Robinson 17 45
## 3 Frank Howard 16 45
## 4 Jay Johnstone 15 45
## 5 Ken Berry 14 45
## 6 Jim Spencer 14 45
## 7 Don Kessinger 13 45
## 8 Luis Alvarado 12 45
## 9 Ron Santo 11 45
## 10 Ron Swoboda 11 45
## 11 Del Unser 10 45
## 12 Billy Williams 10 45
## 13 George Scott 10 45
## 14 Rico Petrocelli 10 45
## 15 Ellie Rodriguez 10 45
## 16 Bert Campaneris 9 45
## 17 Thurman Munson 8 45
## 18 Max Alvis 7 45
We store the data and parameters of the hyperparameter distribution in the list the_data
.
the_data <- list(y = d$r, n = d$n, N = 18,
"mua" = 1, "mub" = 1,
"logn" = log(45))
The JAGS script for this model is stored in the function JAGS_script()
with the “beta_binomial” argument. We fit this model by MCMC using the runs.jags()
function in the runjags
package.
library(runjags)
posterior <- run.jags(JAGS_script("beta_binomial"),
n.chains = 1,
data = the_data,
monitor = c("p", "mu", "eta"),
adapt = 1000,
burnin = 10000,
sample = 20000)
## Compiling rjags model...
## Calling the simulation using the rjags method...
## Adapting the model for 1000 iterations...
## Burning in the model for 10000 iterations...
## Running the model for 20000 iterations...
## Simulation complete
## Calculating summary statistics...
## Finished running the simulation
Posterior summaries of the parameters \(p_1, ..., p_N, \mu, \eta\) are obtained by the print()
method.
print(posterior, digits = 3)
##
## JAGS model summary statistics from 20000 samples (adapt+burnin = 11000):
##
## Lower95 Median Upper95 Mean SD Mode MCerr MC%ofSD SSeff AC.10
## p[1] 0.226 0.293 0.388 0.299 0.0427 -- 0.0012 2.8 1274 0.187
## p[2] 0.221 0.289 0.378 0.294 0.0403 -- 0.000993 2.5 1648 0.179
## p[3] 0.215 0.284 0.366 0.288 0.0385 -- 0.000852 2.2 2039 0.155
## p[4] 0.216 0.28 0.367 0.284 0.0383 -- 0.000894 2.3 1832 0.112
## p[5] 0.206 0.275 0.354 0.278 0.0368 -- 0.000687 1.9 2863 0.0854
## p[6] 0.209 0.274 0.353 0.277 0.036 -- 0.000685 1.9 2764 0.0757
## p[7] 0.205 0.27 0.349 0.272 0.0358 -- 0.000601 1.7 3547 0.0772
## p[8] 0.195 0.265 0.334 0.266 0.0346 -- 0.000581 1.7 3532 0.0629
## p[9] 0.191 0.261 0.334 0.262 0.0352 -- 0.000629 1.8 3120 0.0871
## p[10] 0.189 0.262 0.331 0.262 0.0355 -- 0.000679 1.9 2729 0.0712
## p[11] 0.185 0.257 0.325 0.257 0.0349 -- 0.000723 2.1 2331 0.0954
## p[12] 0.183 0.257 0.322 0.256 0.0349 -- 0.000658 1.9 2818 0.0894
## p[13] 0.186 0.257 0.327 0.256 0.0347 -- 0.000621 1.8 3118 0.0785
## p[14] 0.18 0.257 0.325 0.256 0.0357 -- 0.000619 1.7 3323 0.0795
## p[15] 0.182 0.257 0.321 0.256 0.0351 -- 0.000585 1.7 3603 0.0628
## p[16] 0.175 0.252 0.317 0.25 0.0357 -- 0.000744 2.1 2302 0.128
## p[17] 0.167 0.248 0.314 0.245 0.037 -- 0.000813 2.2 2070 0.126
## p[18] 0.161 0.242 0.306 0.239 0.0371 -- 0.000955 2.6 1511 0.169
## mu 0.232 0.267 0.303 0.267 0.0181 -- 0.00052 2.9 1216 0.302
## eta 13.2 158 1724 663 4177 -- 283 6.8 217 0.544
##
## psrf
## p[1] --
## p[2] --
## p[3] --
## p[4] --
## p[5] --
## p[6] --
## p[7] --
## p[8] --
## p[9] --
## p[10] --
## p[11] --
## p[12] --
## p[13] --
## p[14] --
## p[15] --
## p[16] --
## p[17] --
## p[18] --
## mu --
## eta --
##
## Total time taken: 2.7 seconds