Chapter 10 Federalist Paper Study

10.1 Packages for this example

library(tidyverse)
library(brms)
library(bayesplot)
library(ProbBayes)

10.2 Federalist paper data

The data frame federalist_word_study contains frequency use of words for Federalist Papers written by either Alexander Hamilton or James Madison.

We’ll focus on the frequencies of the word “can” in groups of 1000 words written by Hamilton

federalist_word_study %>% 
  filter(word == "can",
         Authorship == "Hamilton") -> d
head(d)

##                   Name Total word  N         Rate Authorship Disputed
## 65    Federalist No. 1  1622  can  3 0.0018495684   Hamilton       no
## 1526 Federalist No. 11  2511  can  5 0.0019912386   Hamilton       no
## 2437 Federalist No. 12  2171  can  2 0.0009212345   Hamilton       no
## 3125 Federalist No. 13   970  can  4 0.0041237113   Hamilton       no
## 4256 Federalist No. 15  3095  can 14 0.0045234249   Hamilton       no
## 5530 Federalist No. 16  2047  can  1 0.0004885198   Hamilton       no

10.3 The Poisson sampling model

If \(y_i\) represents the count of “can” in the \(i\) group of words, we assume

\[ y_i \sim Poisson(n_i \lambda / 1000), i = 1, ..., N \]

where \(\lambda\) is the true rate of the word among 1000 words.

On log scale, the Poisson mean can be written

\[ \log E(y_i) = \log \lambda + \log(n_i / 1000) \] which can be fit as a generalized linear model with Poisson sampling, log link, intercept model with an offset of \(\log(n_i / 1000)\).

We complete this model by assigning the prior \[ \log \lambda \sim N(0, 2) \]

10.4 Fitting the model

We use the `brm() function with “family = poisson”, specifying the offset “N”, and specifying the prior by use of the “prior” argument.

fit <- brm(data = d, family = poisson,
    N ~ offset(log(Total / 1000)) + 1,
    prior = c(prior(normal(0, 2), 
                    class = Intercept)),
              refresh = 0
    )

## Compiling Stan program...

## Start sampling

We display summaries of the posterior for \(\lambda\).

summary(fit)

##  Family: poisson 
##   Links: mu = log 
## Formula: N ~ offset(log(Total/1000)) + 1 
##    Data: d (Number of observations: 49) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     0.97      0.06     0.86     1.09 1.00     1602     1974
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

We save post as a matrix of simulated draws.

post <- posterior_samples(fit)

The function mcmc_areas() displays a density estimate of the simulated draws and shows the location of a 50% probability interval.

mcmc_areas(post, pars = "b_Intercept")

10.5 Model checking

To check if the Poisson sampling model is appropriate we illustrate several posterior predictive checks.

Here we display density estimates for 10 replicated samples from the posterior predictive distribution of \(y\) and overlay the observed values as a dark line.

pp_check(fit)

## Using 10 posterior samples for ppc type 'dens_overlay' by default.

Here we use \((\bar y, s_y)\) as a checking function. The scatterplot represents values of \((\bar y, s_y)\) from the posterior predictive distribution of replicated data, and the observed value of \((\bar y, s_y)\) is shown as a dot.

pp_check(fit, type = "stat_2d")

## Using all posterior samples for ppc type 'stat_2d' by default.

The takeaway is that the observed data shows more variability than predicted from the Poisson sampling model.

10.6 Negative binomial sampling

One way to handle the extra variability is to assume that the \(y_i\) have a negative binomial distribution. (See the text for details.)

Here we outline the code for fitting this model.

We fit the model with the brm() function with the “family = negbinomial” option.

fit_nb <- brm(data = d, family = negbinomial,
           N ~ offset(log(Total / 1000)) + 1,
           refresh = 0)

## Compiling Stan program...

## Start sampling

Here I can checking on the default priors used by brm:

prior_summary(fit_nb)

##                    prior     class coef group resp dpar nlpar bound
## 1 student_t(3, 1.6, 2.5) Intercept                                 
## 2      gamma(0.01, 0.01)     shape

Here are the posterior summaries.

summary(fit_nb)

##  Family: negbinomial 
##   Links: mu = log; shape = identity 
## Formula: N ~ offset(log(Total/1000)) + 1 
##    Data: d (Number of observations: 49) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     1.00      0.10     0.81     1.19 1.00     3457     2694
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape     3.28      1.10     1.75     5.97 1.00     3450     2535
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

I save the posterior samples in the data frame post.

post <- posterior_samples(fit)

I try the same posterior predictive checks as before. The message is that the negative binomial sampling model is a better fit to these data.

pp_check(fit_nb)

## Using 10 posterior samples for ppc type 'dens_overlay' by default.

pp_check(fit_nb, type = "stat_2d")

## Using all posterior samples for ppc type 'stat_2d' by default.

10.7 Comparing use of a word

Next we compare Madison and Hamilton use of the word “can”. The data frame d2 contains only the word data for the essays that were known to be written by Hamilton or Madison.

federalist_word_study %>% 
  filter(word == "can",
         Authorship %in% c("Hamilton", "Madison")) -> d2

Here I fit a regression model for the mean use of “can”, where the one predictor is the categorical variable “Authorship”.

fit_nb <- brm(data = d2, family = negbinomial,
           N ~ offset(log(Total / 1000)) + 
          Authorship ,
           refresh = 0)

## Compiling Stan program...

## Start sampling

By summarizing the fit, we can see if the two authors differ in their use of the word “can” in their writings.

summary(fit_nb)

##  Family: negbinomial 
##   Links: mu = log; shape = identity 
## Formula: N ~ offset(log(Total/1000)) + Authorship 
##    Data: d2 (Number of observations: 74) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept             1.00      0.09     0.82     1.19 1.00     3486     2799
## AuthorshipMadison    -0.08      0.16    -0.40     0.25 1.00     3265     2711
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape     3.84      1.12     2.23     6.48 1.00     3497     2824
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).