A Bayesian Primer

Topic 16: Learning About a Proportion (Discrete Models)

Introduction

This topic introduces Bayesian statistical inference for a population proportion. We imagine a hypothetical population of two types which we will call "successes" and "failures". The proportion of successes in the population is denoted p. We take a random sample from the population -- we observe s successes and f failures. The goal is to learn about the unknown proportion p on the basis of this data.

A Model and Prior

In this setting, a model is represented by the population proportion p. We don't know it's value, and you represent your beliefs about its location in terms of a prior probability distribution. In this topic, we introduce proportion inference by using a discrete prior distribution for p.

You construct a prior by

specifying a list of plausible values for the proportion p
assigning probabilities to these values that reflect your knowledge about p

It is generally difficult to construct a prior since most of us have had little experience doing it. In the classroom setting, it is helpful to use standard types of prior distributions which are relatively easy to specify.

One type of standard "noninformative" prior

lets the proportion p be one of the equally spaced values 0, .1, .2, ..., .9, 1
assigns each value of p the same probability 1/11

This prior is certainly convenient and works reasonably well when you don't have a lot of data. But, as in the following example illustrates, it is relatively easy to construct a prior when there exists significant prior information about the proportion of interest.

An Example: Who's Better?

Suppose that Andre Agassi is playing Boris Becker in a tennis match. A model is a description of the relative abilities of Agassi and Becker. You can use various expressions to describe models. You might think the two players "are evenly matched". Maybe you're "sure Agassi will win", or maybe you think it is "likely, but not a sure thing" that Becker will win. In statistical inference, it is helpful if you can quantify these opinions. Let p be the probability that Agassi wins. One can view p as the proportion of times Agassi would win if the two players could play a large number of tennis matches under identical conditions. (Here the hypothetical population consists of all matches between the two players which result in either a success where Agassi wins, or a failure where Becker wins.)

Next you choose some plausible models, or values of the proportion p. Suppose you think there are three possibilities --- Agassi is a little better player than Becker, or they are evenly matched, or Becker is a little better.

We match these statements with the respective proportion values p = .6, .5, .4. The value p = .6 is chosen by imagining about how many matches Agassi would win against Becker in a series of 100 contests, if Agassi was indeed the better player. The value p = .4 can be chosen in a similar way.

After you choose the models, you may think that some values of the proportion p are more or less likely. These opinions are reflected in a choice of prior probabilities on the set of models. For example, if you think Agassi will defeat Becker, then the proportion value p=.6 should get a larger prior probability that the value p = .4. Likewise, if you thought Becker was a better player, then p=.4 would get more probability. In the following discussion, we will illustrate inference for a person who thinks Agassi is the superior player and assigns the following prior to p:

		Prior
MODEL	p=.4 (Becker is better)	.2
	p=.5 (same ability)	.3
	p=.6 (Agassi is better)	.5

Likelihood

To use Bayes' rule, we next have to specify the likelihood. This is the probability of the actual data result for each model.

Here the data is the number of successes, s, and the number of failures, f, in our random sample taken from the population. Our model is the proportion of successes p. If the proportion is p, then the probability of observing s successes and f failures is

Posterior Probabilities

Suppose that we have taken our sample and have observed values of s and f. By Bayes' rule, the posterior probability that the proportion takes on a value p is proportional to the product

where Prob(p) represents our prior probability of this value.

The calculation of the posterior probabilities is conveniently displayed using a tabular format (called the Bayes' table). (In the table below, p1, ..., pk represent k possible values of the proportion.)

PROPORTION	PRIOR	LIKELIHOOD
p1	P(p1)
p2	P(p2)

pk	P(pk)

To compute the posterior probabilities, we

multiply the prior and likelihood values for each proportion value to get products
sum the products
divide each product by the sum to get posterior probabilies

Computing the Posterior Probabilities for the Tennis Example

We illustrate computing the posterior probabilities in our example. Suppose Agassi and Becker play a single match and Agassi wins. So we have taken a single observation and we observed a success and so s = 1 and f = 0.

Here is the Bayes' table:

p	PRIOR	LIKELIHOOD	PRODUCT	POSTERIOR
.4	.2	.4	.2 x .4 =.08	.08/.53 = .15
.5	.3	.5	.3 x .5 =.15	.15/.53 = .28
.6	.5	.6	.5 x .6 =.30	.30/.53 = .57
SUM			.53

Before the tennis match, your probability that Agassi was better was .5. After seeing Agassi win the match, your new probability that Agassi is better has increased to .57. This means that you are now more confident that Agassi is better.

Computing Posterior Probabilities using a Bayes' box:

We demonstrate the Bayes' box method computing the posterior probabilities. This method helps in showing the connection between the model (value of p) and the data (values of s and f). In this example, suppose there are 300 pairs of tennis players just like Agassi (A) and Becker (B). (The choice of 300 is made for convenience --- any large number divisible by 3 would be suitable.)

Your prior placed probabilities .2, .3, .5 on the models "A is weaker than B", "A and B are equally matched", "A is better than B".

So you would expect 60 = 300 x .2 of the A's to be better than B's, 90 = 300 x .3 of the pairs to be equally matched, and 150 = 300 x .5 B's would be better than the A's. This knowledge is summarized using a table of counts.

		DATA
		Agassi wins	Becker wins	TOTAL
MODEL	p=.4 (Becker is better)			60
	p=.5 (same ability)			90
	p=.6 (Agassi is better)			150
	TOTAL			300

The data in this example is the result of one tennis match. There are two possible outcomes --- either "Agassi wins" or "Becker wins".

The next step in this process is to relate the models to this data.

You think about the frequencies of the data outcomes for each model. Suppose p = .4 (Becker is a better player) and Agassi and Becker play each other 60 times. Then you would expect Agassi to win 24 matches and Becker the remaining 36. Similarly, if the players are equally good (p = .5) and they play 90 matches, you expect each player to win 45 matches. If Agassi is the superior player (p = .6) and they play 150 matches, then you expect Agassi to win 90 and Becker 60. We place these counts in a Bayes' box in which the row corresponds to a value of the proportion p and a column is the result of the single tennis match.

		DATA
		Agassi wins	Becker wins	TOTAL
MODEL	p=.4 (Becker is better)	24	36	60
	p=.5 (same ability)	45	45	90
	p=.6 (Agassi is better)	90	60	150
	TOTAL			300

Suppose Agassi wins the match. Presumably we would be more confident that Agassi is the better player. We can learn how much more confident by updating our probabilities about the models by use of Bayes' rule. This computation is made by looking at the Bayes' box and restricting attention to the observed data outcome ("Agassi wins match"). We note that Agassi won 24 + 45 + 90 = 159 matches in our hypothetical experiment. Of these 159 matches, Becker was the better player in 24 of them, the two players were equivalent in 45, and Agassi was better in 90. Our updated table of model counts and proportions is given below.

		DATA
		Agassi wins	PROPORTION
MODEL	p=.4 (Becker is better)	24	24/159 = .15
	p=.5 (same ability)	45	45/159 = .28
	p=.6 (Agassi is better)	90	90/159 = .57
	TOTAL	159	1

The probability that Agassi is the better player is .57 which is the same number we found using the Bayes' table.

Using a Prior with More Models and More Data

We illustrate learning about a proportion in this same example using a different prior and more data. Suppose that one wishes to use the "default" prior where the proportion p is allowed to take one of the 11 values 0, .1, .2, ..., .9, 1 and each proportion value is assigned the same prior probability. This prior reflects that the person has little knowledge about the relative abilities of Agassi and Becker. It is plausible that Becker will always defeat Agassi and p = 0, or alternately Agassi will always win and p = 1. In addition, the person is placing equal probabilities on these 11 different scenarios, reflecting the lack of knowledge about the relative abilities of professional tennis players.

Also suppose that the person observes the results of ten matches between the two players. In these matches, Agassi wins 7 times and Becker wins the remaining 3. Using our notation above where a "success" is defined as Agassi winning, we observe s = 7 and f = 3.

We illustrate computing the posterior probabilities for the proportion p using the Minitab macro p_disc. To use this program, one places the values of p into a single column in the Minitab data table and the corresponding prior probabilities in a different column. In the output below, these columns are named 'p' and 'prior'. The computation of the posterior is accomplished by the Minitab command (typed in the Session Window)

MTB > %p_disc 'p' 'prior' 7 3

The Minitab output from this command is a Bayes' Table, showing values of p, the prior probabilities, the likelihood, the products, and the posterior probabilities.

PRIOR AND POSTERIOR DENSITIES OF P:

 Row      P       PRIOR       LIKE   PRODUCT      POSTR

   1    0.0   0.0909091          0       0.0   0.000000
   2    0.1   0.0909091         33       3.0   0.000010
   3    0.2   0.0909091       2947     267.9   0.000864
   4    0.3   0.0909091      33736    3066.9   0.009892
   5    0.4   0.0909091     159156   14468.7   0.046668
   6    0.5   0.0909091     439188   39926.1   0.128779
   7    0.6   0.0909091     805728   73248.0   0.236256
   8    0.7   0.0909091    1000000   90909.1   0.293220
   9    0.8   0.0909091     754518   68592.6   0.221240
  10    0.9   0.0909091     215104   19554.9   0.063073
  11    1.0   0.0909091          0       0.0   0.000000

A line graph of the posterior probabilities of p is shown below.

Using this posterior probability distribution, we can answer different inferential questions.

What's my best guess at the relative abilities of Agassi and Becker?

We measure the relative abilities of the two players by p, the proportion of times Agassi would win if the two players played an infinite number of matches. One best guess of p is the value which is assigned the largest probability (the mode). Looking at the posterior probability table, we see that p = .7 has the largest probability -- so p = .7 would be our best guess.
Can I construct an interval of values where I am confident contains the unknown value of p?

Here we find a set of values of proportion values p which contains a high, say 90%, of the posterior probability distribution. A simple way of finding this interval is to arrange the values of p from most likely to least likely, and then put values of p into the interval until the cumulative probability of the set exceeds .9.

Below I have sorted the proportion values from most likely to least likely. Looking at the "cum prob" column, we see that the probability that p is in the set {.7, .6, .8, .5, .9} is .9426. So this set is a 90% probability interval for p.

p	prob	cum prob
0.7	0.2932	0.2932
0.6	0.2363	0.5295
0.8	0.2212	0.7507
0.5	0.1288	0.8795
0.9	0.0631	0.9426
0.4	0.0467	0.9892
0.3	0.0099	0.9991
0.2	0.0009	1.0000
0.1	0.0000	1.0000
0	0.0000	1.0000
1	0.0000	1.0000

What is the probability that Agassi is the better player?

Remember that p is the proportion of matches Agassi would win against Becker if they played a large number of matches. To say that Agassi is the better player means that p is larger than .5. From the posterior probability table, we compute

Prob(p > .5) = Prob(p = .6, .7, .8, .9, 1) = .236 + .293 + .221 + .063 = .813

Since this probability (.813) is moderately large, we have some confidence that Agassi is the superior player.

Page Author: Jim Albert (c)
albert@math.bgsu.edu
Document: http://math-80.bgsu.edu/nsf_web/main.htm/primer/topic16.htm
Last Modified: October 17, 2000