4 Learning About a Proportion
4.1 Introduction
Suppose data
In the computation of the posterior density, note that the only terms involving the unknown parameter
In a Bayesian analysis, both the posterior density and the marginal density play important roles. The posterior density contains all information about the parameter contained in both the prior density and the data. One performs different types of inference by computing relevant summaries of the posterior density. The marginal density
4.2 An Example on Learning About a Proportion
In this chapter, we discuss the basic elements of a Bayesian analysis through the problem of learning about a population proportion
As an example, suppose that coordinator of developmental math courses at a particular university is concerned about the proportion of students in these courses who have math anxiety, where “math anxiety” is defined by obtaining a particular score on an anxiety rating instrument. A sample of 30 students takes the instrument and 10 have math anxiety. What can be said about the proportion of all developmental math course students who have math anxiety?
The standard estimate of
For large samples, this interval will cover the unknown proportion in repeated sampling with probability
One ad-hoc solution to the “zero successes” problem is to initially add two artificial successes and two artificial failures to the data, and then apply the Wald interval to this adjusted data. This is a recommended approach in the literature and the resulting confidence interval has good sampling probabilities. We will see that this ad-hoc procedure has a natural correspondence with a Bayesian interval that incorporates prior information about the proportion.
4.3 Using a Discrete Prior
One simple way of incorporating prior information about
In the example, suppose one lists the possible values for the proportion of mathematics students with math anxiety displayed in the following table.
0.05 | 0.10 | 0.15 | 0.20 | 0.25 | 0.30 | 0.35 | 0.40 | 0.45 | 0.50 | |
---|---|---|---|---|---|---|---|---|---|---|
Prior |
Suppose one’s best guess at the proportion of students with math anxiety is
The values
0.05 | 0.10 | 0.15 | 0.20 | 0.25 | 0.30 | 0.35 | 0.40 | 0.45 | 0.50 | |
---|---|---|---|---|---|---|---|---|---|---|
Prior Weight | 1 | 2 | 5 | 10 | 5 | 3 | 2 | 1 | 1 | 1 |
Prior | .032 | .065 | .161 | .323 | .161 | .097 | .065 | .032 | .032 | .032 |
Once this prior distribution is assigned, one can compute the posterior probabilities by use of Bayes’ rule. One observes
Prior | Likelihood | Product | Posterior | |
---|---|---|---|---|
– | – | – | – | – |
– | – | – | – | – |
– | – | – | – | – |
– | – | – | SUM | – |
The Bayes’ rule calculations are illustrated in the following table for our math anxiety example. For the example, we observed
Prior | Likelihood | Product | Posterior | |
---|---|---|---|---|
0.05 | 0.032 | 0 | 0 | 0.000 |
0.10 | 0.065 | 12 | 1 | 0.000 |
0.15 | 0.161 | 224 | 36 | 0.019 |
0.20 | 0.323 | 1181 | 381 | 0.200 |
0.25 | 0.161 | 3024 | 487 | 0.255 |
0.30 | 0.097 | 4712 | 457 | 0.239 |
0.35 | 0.065 | 5000 | 325 | 0.170 |
0.40 | 0.032 | 3834 | 123 | 0.064 |
0.45 | 0.032 | 2185 | 70 | 0.037 |
0.50 | 0.032 | 931 | 30 | 0.016 |
To interpret the posterior probabilities, remember that initially we believed that the proportion of math anxiety students was about 0.20, although we were unsure about its true value and the prior was relatively diffuse about
4.4 Using a Noninformative Prior
There are some advantages to using a discrete prior for a proportion. It provides a starting point for finding a prior distribution that reflects one’s knowledge, before sampling, about the location of the proportion. Also it is easy to summarize a discrete posterior distribution. But since the proportion
First, suppose one has little knowledge about the location of the proportion. In our example, suppose that one has little information about the proportion of students in the class who have math anxiety. How can one construct a prior distribution that reflects little or imprecise knowledge about the location of the parameter? This type of distribution is called a noninformative prior or ignorance prior. Using this type of prior, the posterior distribution will typically be more influenced by the data than the prior information.
One possible choice for a noninformative prior assumes that
If we observe
If we use a uniform prior for
4.5 Using a Conjugate Prior
In many situations, the use of noninformative priors is appropriate since the user does not have any knowledge about the parameter from previous experience. But in other situations such as the math anxiety example, the user does have knowledge about the unknown proportion before sampling and one wishes to construct a continuous prior on the unit interval that represents this prior knowledge.
One convenient family of prior distributions is the beta family with shape parameters
One way of assessing values of
for
An alternative approach is to assess the parameters beta.select()
in the LearnBayes
package, one matches these prior quantiles with the beta parameters
Once one assesses the values of the beta parameters, it is easy to compute the posterior distribution. By multiplying the prior and the likelihood, one obtains that the posterior density of
which we recognize as a beta density with updated parameters
In our example, if our prior is beta(4.0, 12.5) and we have
4.6 Inference
After one observes data, then all knowledge about the parameter is contained in the posterior distribution. It is common to simply display the posterior density and the reader can learn about the location and spread by simply looking at this curve. To obtain different types of statistical inferences, one summarizes the posterior distribution in various ways. We illustrate using the posterior distribution to obtain point and interval estimates of the parameter.
4.6.1 Point Inference
A suitable point estimate of a parameter is a single-number summary of the posterior density. The posterior mean is the mean of the posterior distribution given by the integral
In the case where a beta(qbeta
, the posterior median is found to be
In the case where the posterior density is approximately symmetric, as in this example, the posterior mean, posterior median, and posterior mode will be approximately equal. In other situations where the posterior density is right or left skewed, these summary values can be different. One nice feature of the posterior median is its clear interpretation as the value that divides the posterior probability in half.
4.6.2 Interval Estimation
Typically, a point estimate such as a posterior median is insufficient for understanding the location of a parameter. A Bayesian interval estimate or credible interval is an interval that contains the parameter with a given probability. Specifically, a
In our example, the posterior for hpd
in the TeachingDemos
package, one computes the HPD interval (0.191, 0.409). Since the posterior density is approximately symmetric, the equal-tail and HPD intervals are approximately equal.
4.6.3 Estimation of Probabilities
One attractive feature of the Bayesian approach is that one can see if the parameter falls in different regions by simply computing the posterior probabilities of these regions. In the math anxiety example, suppose we are interested in the plausibility that the proportion falls in the intervals (0, 0.2), (0.2, 0.4), (0.4, 0.6), (0.6, 0.8), (0.8, 1). The posterior distribution for the proportion of math anxious students is beta(14.0, 32.5) and by use of the R pbeta
command, we can compute the probabilities of these regions and these probabilities are displayed in Table
Interval | Posterior Probability |
---|---|
(0, 0.2) | 0.06 |
(0.2, 0.4) | 0.87 |
(0.4, 0.6) | 0.08 |
(0.6, 0.8) | 0.00 |
(0.8, 1.0) | 0.00 |
4.7 Using Alternative Priors
The choice of a beta prior is made by convenience. By use of a beta prior, the posterior has the same functional (beta) form and it is easy to summarize the posterior distribution. But Bayes’ rule can be applied for any continuous prior density of
In some situations, one may have prior beliefs about the logit of
As before, the likelihood function is
In this situation, we no longer have a conjugate analysis, since the prior and posterior densities have different functional forms. Moreover, the posterior has a functional form that we do not recognize as a member of a familiar family such as the beta. However, this just means that we will need alternative tools to summarize the posterior distribution to perform inferences.
4.8 Prediction
In this chapter, we have focused on the use of the posterior distribution to make inferences about the proportion
Let
Suppose we assign
Suppose our current knowledge about the proportion is contained in a beta(
In our example, after observing the sample, the beliefs about the proportion of math anxious students is represented by a beta(14.0, 32.5) distribution. By use of the R function pbetap()
, one can compute the predictive density for the number of math anxious students in a future sample of