Introduction to the ProbBayes Package

Introduction

The ProbBayes package was written to accompany the text Probability and Bayesian Modeling by Jim Albert and Jingchen Hu. The package includes a number of functions to illustrate Bayesian computations described in the text. In addition, ProbBayes contains all datasets used in the examples and exercises. The purpose of this document to provide an overview of the functions contained in the package.

Installing ProbBayes

The ProbBayes package is available on CRAN. The package is installed by use of the install.packages() function.

install.packages("ProbBayes")

To load the package, use the library() function.

library(ProbBayes)

(Note: The ProbBayes package depends on the LearnBayes, ggplot2, gridExtra, and shiny packages. It may be necessary to also install these prerequisite packages for ProbBayes to correctly install.)

Datasets

All datsets mentioned in Probability and Bayesian Modeling are contained in the ProbBayes package. For example, the dataset pt99price.csv contains the point prices of 0.99 carat diamonds in Exercise 16 of Chapter 8. This data is contained in the data framew pt99price`` inProbBayes```.

head(pt99price)

##   diamond    price
## 1       1 28.39394
## 2       2 28.40404
## 3       3 29.78788
## 4       4 33.70707
## 5       5 36.29293
## 6       6 40.42424

Functions in Chapter 3 - Bayes’ Rule

Learning About a Spinner

This chapter introduces Bayes’ rule by use of a spinner example. Suppose there are four possible spinners defined by four vectors. The length of a vector corresponds to the number of areas and the values are proportional to the areas.

s1 <- c(2, 2, 2, 2)
s2 <- c(4, 1, 1, 2)
s3 <- c(2, 4, 2)
s4 <- c(1, 3, 3, 1)

We can see these spinners by use of the many_spinner_plots() function.

many_spinner_plots(list(s1, s2, s3, s4))

Initially, suppose you have little knowledge about the actual spinner and so you place a uniform prior on these spinners. We construct a data frame with the four models and the associated prior probabilities.

(bayes_table <- data.frame(Model = 
                paste("Spinner", c("A", "B", "C", "D")),
                Prior = c(1, 1, 1, 1) / 4))

##       Model Prior
## 1 Spinner A  0.25
## 2 Spinner B  0.25
## 3 Spinner C  0.25
## 4 Spinner D  0.25

Suppose a spinner is chosen at random and the outcome is “1”. The likelihood is the chance of observing “1” for each spinner.

library(dplyr)
(bayes_table %>% 
  mutate(Likelihood = c(1/4, 1/2, 1/4, 1/8)) ->
bayes_table)

##       Model Prior Likelihood
## 1 Spinner A  0.25      0.250
## 2 Spinner B  0.25      0.500
## 3 Spinner C  0.25      0.250
## 4 Spinner D  0.25      0.125

The bayesian_crank() function will update the probabilities given that the spinner landed “1”.

(bayes_table <- bayesian_crank(bayes_table))

##       Model Prior Likelihood Product Posterior
## 1 Spinner A  0.25      0.250 0.06250 0.2222222
## 2 Spinner B  0.25      0.500 0.12500 0.4444444
## 3 Spinner C  0.25      0.250 0.06250 0.2222222
## 4 Spinner D  0.25      0.125 0.03125 0.1111111

The prior_post_plot() plots the prior and posterior probabilities of the spinners. We see that it is most likely that the unknown spinner is Spinner B.

prior_post_plot(bayes_table)

Functions for Chapter 7: Learning about a Binomial Probability

Using a Discrete Prior

We introduce Bayesian thinking about a proportion using a discrete prior. Suppose you are interested in estimating the proportion \(p\) of coffee drinkers on campus and your prior on \(p\) is uniform on the values \(p\) = 0.1, 0.2, …, 0.9. You take a sample of 20 students and you find 12 coffee drinkers. What have you learned about \(p\)?

Construct a data frame with your prior.

(df <- data.frame(p = seq(0.1, 0.9, by = 0.1),
                 Prior = rep(1/9, 9)))

##     p     Prior
## 1 0.1 0.1111111
## 2 0.2 0.1111111
## 3 0.3 0.1111111
## 4 0.4 0.1111111
## 5 0.5 0.1111111
## 6 0.6 0.1111111
## 7 0.7 0.1111111
## 8 0.8 0.1111111
## 9 0.9 0.1111111

The likelihood is the binomial probability of 12 successes out of 20 when the probability of success is \(p\).

(df %>% 
  mutate(Likelihood = dbinom(12, size = 20, prob = p)) -> df)

##     p     Prior   Likelihood
## 1 0.1 0.1111111 5.422595e-08
## 2 0.2 0.1111111 8.656592e-05
## 3 0.3 0.1111111 3.859282e-03
## 4 0.4 0.1111111 3.549744e-02
## 5 0.5 0.1111111 1.201344e-01
## 6 0.6 0.1111111 1.797058e-01
## 7 0.7 0.1111111 1.143967e-01
## 8 0.8 0.1111111 2.216088e-02
## 9 0.9 0.1111111 3.557765e-04

Update the probabilities using the bayesian_crank() function.

(df <- bayesian_crank(df))

##     p     Prior   Likelihood      Product    Posterior
## 1 0.1 0.1111111 5.422595e-08 6.025106e-09 1.138730e-07
## 2 0.2 0.1111111 8.656592e-05 9.618436e-06 1.817860e-04
## 3 0.3 0.1111111 3.859282e-03 4.288091e-04 8.104383e-03
## 4 0.4 0.1111111 3.549744e-02 3.944160e-03 7.454362e-02
## 5 0.5 0.1111111 1.201344e-01 1.334826e-02 2.522788e-01
## 6 0.6 0.1111111 1.797058e-01 1.996731e-02 3.773771e-01
## 7 0.7 0.1111111 1.143967e-01 1.271075e-02 2.402299e-01
## 8 0.8 0.1111111 2.216088e-02 2.462320e-03 4.653722e-02
## 9 0.9 0.1111111 3.557765e-04 3.953072e-05 7.471206e-04

Show the prior and posterior for \(p\). We see that it is very likely that \(p\) is 0.5, 0.6, or 0.7.

prior_post_plot(df)

Using a Beta Prior

Using the same example, suppose you want to use a beta prior to model your beliefs about \(p\), the proportion of coffee drinkers on campus. Your median for \(p\) is 0.40 and you are 90% sure that \(p\) is smaller than 0.55. The beta.select() function will find the shape parameters of the beta prior that matches these statements.

(ab <- beta.select(list(p = 0.5, x = 0.4),
                   list(p = 0.9, x = 0.55)))

## [1]  7.53 11.13

(You can graphically find the beta parameters to match statements of the 50th and 90th percentiles by use of a Shiny app accessed by the ChooseBeta() function.)

We observed 12 coffee drinkers and 8 non-coffee drinkers. We update the beta parameters of the beta posterior.

(ab_new <- ab + c(12, 8))

## [1] 19.53 19.13

The beta_prior_post() function shows the prior and posterior beta functions.

beta_prior_post(ab, ab_new)

There are several functions beta_area(), beta_interval(), beta_quantile() to summarize the beta posterior.

To illustrate, we use the beta_interval() to construct a 90% interval for \(p\).

beta_interval(0.90, ab_new)

Functions for Chapter 8: Learning about a Normal Mean

Using a normal prior

Suppose a dietician wishes to learn about the mean weight gain \(\mu\) of freshmen college students in the first month of college. His best guess apriori at \(\mu\) is 3 lbs and he is 90% sure that \(\mu\) is smaller than 5 lbs.

We find a matching normal prior by use of the normal.select() function.

(normal_prior <- normal.select(list(p = .5, x = 3),
                              list(p = .9, x = 5)))

## $mu
## [1] 3
## 
## $sigma
## [1] 1.560608

We see that a N(3, 1.56) prior matches his beliefs.

Suppose a sample of students gives \(\bar y = 2\) with a standard error of \(se = 0.5\). One can find the parameters of the normal posterior by the normal_update() function.

prior <- c(3, 1.56)
data <- c(2, 0.5)
normal_update(prior, data, teach = TRUE)

##        Type     Mean Precision Stand_Dev
## 1     Prior 3.000000 0.4109139 1.5600000
## 2      Data 2.000000 4.0000000 0.5000000
## 3 Posterior 2.093158 4.4109139 0.4761411

We see the posterior for \(\mu\) is N(2.09, 0.48).

The functions normal_area(), normal_interval() and normal_quantile() are helpful for computing various summaries of the normal posterior.

What is the posterior probability that \(\mu\) is between 1 and 3 lbs?

normal_area(1, 3, c(2.09, 0.48))

Functions for Chapter 9: Introduction to MCMC

There are several functions random_walk(), metropolis(), gibbs_discrete(), gibbs_normal(), and gibbs_betabin() that are discussed in the text.

Using the Metropolis algorithm

We illustrate the use of metropolis() to simulate from a real-valued posterior of a single parameter.

In the Cauchy-Normal problem discussed in Section 9.4, suppose \(\mu\) is the mean monthly snowfall and we assign \(\mu\) a Cauchy prior with location 10 and scale 2. We observe some data – the sample mean is \(\bar y = 26.78\) inches with corresponding standard error of 3.24 inches.

I program the log posterior:

cn_logpost <- function(mu, loc, scale, ybar, se){
  dcauchy(mu, loc, scale, log = TRUE) +
    dnorm(ybar, mu, se, log = TRUE)
}

I use metropolis sampling where (1) the starting value is 10, (2) the width of the proposal density is 20, and I take 5000 interations.

post <- metropolis(cn_logpost, 10, 20, 5000,
                10, 2, 26.78, 3.24)

We check the acceptance rate of the algorithm.

post$accept_rate

## [1] 0.2688

By using the coda package, we display a trace plot and density plot of the simulated draws.

library(coda)
plot(mcmc(post$S))

We use the coda package to get posterior summaries.

summary(mcmc(post$S))

## 
## Iterations = 1:5000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 5000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##           Mean             SD       Naive SE Time-series SE 
##       25.44596        3.40757        0.04819        0.10744 
## 
## 2. Quantiles for each variable:
## 
##  2.5%   25%   50%   75% 97.5% 
## 18.79 23.19 25.51 27.85 31.79

Functions for Chapter 10 Hierarchical Modeling

Beta-binomial modeling

Section 10.3 illustrates the use of hierarchical model where

\(y_j \sim\) binomial(\(n_j, p_j\)) and \(p_j \sim\) beta(\(a, b\)), \(i = 1, ..., N\), where \(a = \eta \mu\), \(b = \eta (1 - \mu)\).
The hyperparameters are given the priors \(\mu \sim Beta(a, b)\) and \(\log \eta \sim\) Logistic(\(\log n^*, 1\)).

We illustrate fitting this model on the famous Efron and Morris’ baseball dataset stored in the pscl package.

library(pscl)
(select(EfronMorris, name, r) %>%
      mutate(n = 45) -> d)

##                name  r  n
## 1  Roberto Clemente 18 45
## 2    Frank Robinson 17 45
## 3      Frank Howard 16 45
## 4     Jay Johnstone 15 45
## 5         Ken Berry 14 45
## 6       Jim Spencer 14 45
## 7     Don Kessinger 13 45
## 8     Luis Alvarado 12 45
## 9         Ron Santo 11 45
## 10      Ron Swoboda 11 45
## 11        Del Unser 10 45
## 12   Billy Williams 10 45
## 13     George Scott 10 45
## 14  Rico Petrocelli 10 45
## 15  Ellie Rodriguez 10 45
## 16  Bert Campaneris  9 45
## 17   Thurman Munson  8 45
## 18        Max Alvis  7 45

We store the data and parameters of the hyperparameter distribution in the list the_data.

the_data <- list(y = d$r, n = d$n, N = 18,
                 "mua" = 1, "mub" = 1,
                 "logn" = log(45))

The JAGS script for this model is stored in the function JAGS_script() with the “beta_binomial” argument. We fit this model by MCMC using the runs.jags() function in the runjags package.

library(runjags)
posterior <- run.jags(JAGS_script("beta_binomial"),
                          n.chains = 1,
                          data = the_data,
                monitor = c("p", "mu", "eta"),
                          adapt = 1000,
                          burnin = 10000,
                          sample = 20000)

## Compiling rjags model...
## Calling the simulation using the rjags method...
## Adapting the model for 1000 iterations...
## Burning in the model for 10000 iterations...
## Running the model for 20000 iterations...
## Simulation complete
## Calculating summary statistics...
## Finished running the simulation

Posterior summaries of the parameters \(p_1, ..., p_N, \mu, \eta\) are obtained by the print() method.

print(posterior, digits = 3)

## 
## JAGS model summary statistics from 20000 samples (adapt+burnin = 11000):
##                                                                             
##       Lower95 Median Upper95  Mean     SD Mode    MCerr MC%ofSD SSeff  AC.10
## p[1]    0.226  0.293   0.388 0.299 0.0427   --   0.0012     2.8  1274  0.187
## p[2]    0.221  0.289   0.378 0.294 0.0403   -- 0.000993     2.5  1648  0.179
## p[3]    0.215  0.284   0.366 0.288 0.0385   -- 0.000852     2.2  2039  0.155
## p[4]    0.216   0.28   0.367 0.284 0.0383   -- 0.000894     2.3  1832  0.112
## p[5]    0.206  0.275   0.354 0.278 0.0368   -- 0.000687     1.9  2863 0.0854
## p[6]    0.209  0.274   0.353 0.277  0.036   -- 0.000685     1.9  2764 0.0757
## p[7]    0.205   0.27   0.349 0.272 0.0358   -- 0.000601     1.7  3547 0.0772
## p[8]    0.195  0.265   0.334 0.266 0.0346   -- 0.000581     1.7  3532 0.0629
## p[9]    0.191  0.261   0.334 0.262 0.0352   -- 0.000629     1.8  3120 0.0871
## p[10]   0.189  0.262   0.331 0.262 0.0355   -- 0.000679     1.9  2729 0.0712
## p[11]   0.185  0.257   0.325 0.257 0.0349   -- 0.000723     2.1  2331 0.0954
## p[12]   0.183  0.257   0.322 0.256 0.0349   -- 0.000658     1.9  2818 0.0894
## p[13]   0.186  0.257   0.327 0.256 0.0347   -- 0.000621     1.8  3118 0.0785
## p[14]    0.18  0.257   0.325 0.256 0.0357   -- 0.000619     1.7  3323 0.0795
## p[15]   0.182  0.257   0.321 0.256 0.0351   -- 0.000585     1.7  3603 0.0628
## p[16]   0.175  0.252   0.317  0.25 0.0357   -- 0.000744     2.1  2302  0.128
## p[17]   0.167  0.248   0.314 0.245  0.037   -- 0.000813     2.2  2070  0.126
## p[18]   0.161  0.242   0.306 0.239 0.0371   -- 0.000955     2.6  1511  0.169
## mu      0.232  0.267   0.303 0.267 0.0181   --  0.00052     2.9  1216  0.302
## eta      13.2    158    1724   663   4177   --      283     6.8   217  0.544
##           
##       psrf
## p[1]    --
## p[2]    --
## p[3]    --
## p[4]    --
## p[5]    --
## p[6]    --
## p[7]    --
## p[8]    --
## p[9]    --
## p[10]   --
## p[11]   --
## p[12]   --
## p[13]   --
## p[14]   --
## p[15]   --
## p[16]   --
## p[17]   --
## p[18]   --
## mu      --
## eta     --
## 
## Total time taken: 2.7 seconds