3 Bayes Rule
3.1 Introduction
Here is a basic exposition of Bayes rule. Suppose you have events
One is interested in computing the probabilities
3.2 Illustrations of Bayes’ Rule
3.2.1 Example: Student Takes a Test
Suppose a student is taking a one-question multiple choice test with four possible choices. Either the student knows the material or she doesn’t; denote these two possibilities by
Here the events
Given that the student gets the question correct, we’re interested in determining the probability of
Substituting in the given values, we obtain
Does this answer make sense? Before the test, the teacher believed that the student knew the material with probability 0.7. The student got the question correct which intuitively should increase the teacher’s probability that the student was a good student. Bayes’ rule allows us to explicitly compute how the probability should increase – the probability has increased from
3.2.2 Example: Balls in a Bag
Suppose a bag contains exactly one white ball. You roll a die and if the outcome of the die roll is
In this example, let
The probability of observing a red depends on the die roll. If the die roll is
In this story, a red ball is observed and we are interested in computing
By substituting the known quantities, we have
A convenient way of computing the die roll probabilities is by use of a table. In Table 2.1, each row corresponds to a specific die roll – we call these alternatives in the table. For each die roll, the table gives the initial probability
One computes the updated probabilities by dividing each product by the sum of the products. For example the updated probability
Alternative | Probability | Product | |
---|---|---|---|
1/6 | 1/(1+1) | 1/12 | |
1/6 | 2/(2+1) | 2/18 | |
1/6 | 3/(3+1) | 3/24 | |
1/6 | 4/(4+1) | 4/30 | |
1/6 | 5/(5+1) | 5/36 | |
1/6 | 6/(6+1) | 6/42 |
To make sense of these calculations, we started assuming that all six possible rolls of the die were equally likely.
With the observation of a red ball, the updated probabilities are unequal and give support for larger rolls of the die.
3.3 New Terminology
In general, we are interested in learning about
Now the a particular data result
The updated probabilities {
POSTERIOR
A convenient way to performing the Bayes’ rule calculations is by use of a table similar to the example. We illustrate the use of the new terminology and the table calculations for two additional examples.
3.4 Example: Testing for a Disease
Suppose you are one of the many people who are tested for a rare disease. From reports, you known the the incidence of this disease is 1 out of 5000. You take the test and there are two results: either you are told “ok” or “see your doctor for further checks.” How should you feel on the basis of these two results?
There are two possible models in this example: you are either “diseased” or “not disease”. Assuming that you are a representative person from your community, your prior beliefs are that
The “data” in this example is the screening test result. There are two outcomes: either the test will be “positive” or “+”, which is some indication that you have the disease, or “negative” or “-” which is good news. From past experience, the screening test has 5% false positives and 2% false negatives.
This means that if you really don’t have the disease, the chance you get a positive result is 0.05; that is,
Similarly, if you really have the disease, the chance of an incorrect negative result is 0.02:
These values are the likelihoods – the probabilities of the data outcomes for each model.
Suppose you have a positive test result (
Prior | Probability | Product | Posterior | |
---|---|---|---|---|
Not diseased | 0.9998 | 0.05 | 0.04999 | 0.9961 |
Diseased | 0.0002 | 0.98 | 0.000196 | 0.0039 |
Before the test, your probability of having the disease was 0.02 and after getting the positive test result, this probability has increased to 0.039. This new probability is almost twice the initial probability, but you are still very unlikely to have the disease.
What if you received a negative test result? We repeat the Bayes’ rule calculations in Table 2.3 with a change in the likelihood values.
Model | Prior | Product | Posterior | |
---|---|---|---|---|
Not diseased | 0.9998 | 0.95 | 0.949810 | 0.999996 |
Diseased | 0.0002 | 0.02 | 0.000004 | 0.000004 |
The probability of having the disease has decreased from 0.0002 to 0.000004.
These results are usually surprising to doctors and patients. It seems difficult to update probabilities accurately and people typically have a much stronger opinion they have the disease when faced with a positive test result.
3.5 Example: The Three Door Problem
There is a famous probability problem, called The Three Door Problem or The Car and the Goats that can be addressed by Bayes’ rule. There is a TV show where a contestant is showed three numbered doors, Door 1, Door 2, and Door 3, where one door is hiding a car and the other two doors hiding goats. The contestant is allowed to choose a door and win the corresponding prize. The contestant chooses Door 1. The host, who knows which door hides the car, then opens Door 2 to reveal a goat. The contestant is given the opportunity to change her selection. Should she switch her choice to Door 3?
In this example the unknown model is the location of the car. We will let
Here the data is the event that the host showed Door 2 – we’ll call this event
Model | Prior | Product | Posterior | |
---|---|---|---|---|
1/3 | ||||
1/3 | ||||
1/3 |
Let’s consider the likelihood
If the car is really behind door 1, the host can either show Door 2 or Door 3. We will assume that the probability he shows Door 2 is a number
between 0 and 1, so .If the car is behind door 2, the host cannot show this door. So
.If the car is behind door 3, the host cannot show this door -- he has to show Door 2. So
.
We complete the table in Table 2.5 by filling in the likelihoods and computing the posterior probabilities.
Model | Prior | Product | Posterior | |
---|---|---|---|---|
1/3 | ||||
1/3 | 0 | 0 | 0 | |
1/3 | 1 | 1/3 |
Let’s return to our question. Remember the contestant chose Door 1 and has the opportunity to switch to Door 3. Given the data “host shows Door 2”, we have found that the probability the car is behind Door 1 is
or
which is true if
3.6 Sequential Learning
A machine in a small factory is producing a particular automotive component. Most of the time (specifically, 90% from historical records), the machine is working well and produces 95% good parts. Some days, the machine doesn’t work as well (it’s broken) and produces only 70% good parts. A worker inspects the first dozen parts produced by this machine on a particular morning and obtains the following results (
The worker is interested in assessing the probability the machine is working well.
In this problem there are two models – either the machine is working well, or “working” for short, or it is “broken”.
Based on the historical data, the worker assigns prior probabilities of 0.90 and 0.10 to the models “working” and “broken”. The data are the results of the inspection on the 12 parts. To understand the relationship between the data and the models, we compute the sampling probabilities, the probabilities of each data outcome for each model. If the machine is working, the probabilities of a good (g) part and a bad (b) part are 0.95 and 0.05, respectively. So
If instead the machine is broken, the probabilities of good and bad part are 0.70 and 0.30, respectively:
Now we’re ready to do the Bayes’ rule computation. The outcomes of twelve inspections of parts are the data:
The likelihoods are the probabilities of this data result for each of the two models. Assuming independence of individual outcomes, the likelihood of the working model is given by
Similarly, the likelihood of the broken model is
Using the “prior times likelihood” recipe, we compute the posterior probabilities in the following table.
Model | Prior | Likelihood | Product | Posterior |
---|---|---|---|---|
Working | 0.90 | 0.00007878 | 0.000070902 | 0.3942 |
Broken | 0.10 | 0.00108955 | 0.000108955 | 0.6058 |
We see that the posterior probability that the machine is broken is over 60% and perhaps the machine should be stopped for inspection and repair.
There is another way to implement Bayes’ rule when the data are observed in a sequential manner. Before any data are collected, the inspector’s probabilities of the two states of the machine, working and broken, are given by 0.90 and 0.10. He observes the quality of the first part – “g” – and then he can immediately update his probabilities by Bayes’ rule.
Model | Prior | Likelihood | Product | Posterior |
---|---|---|---|---|
Working | 0.90 | 0.95 | 0.855 | 0.9243 |
Broken | 0.10 | 0.70 | 0.070 | 0.0757 |
After this single observation, he is slightly more confident (with probability 0.9243) that the machine is working.
The inspector’s current probabilities of the two models are 0.9243 and 0.0757. He observes the quality of the next part – “b” – and again he can update his probabilities by Bayes’ rule. In this table “Prior” refers to his beliefs before observing the data.
Model | Prior | Likelihood | Product | Posterior |
---|---|---|---|---|
Working | 0.9243 | 0.05 | 0.046215 | 0.6705 |
Broken | 0.0757 | 0.30 | 0.022710 | 0.3295 |
We see that, after observing two parts, the inspector’s probability that the machine is working is 0.6705.
One can continue learning in this sequential manner. As one observes the quality of each single part, the inspector can update his probability of the two models by Bayes’ rule. Table 2.9 summarizes the results of this sequential learning. The first row of the table displays the prior probabilities of the working and broken models and the following rows display the probabilities after each outcome is observed. Note that the final row indicates that the probabilities after observing the 12 parts are equal to 0.3942 and 0.6058. As expected, these posterior probabilities are the same as the ones computed using the group of 12 observations as data.
Observation | P(Working) | P(Broken) |
---|---|---|
Prior | 0.9000 | 0.1000 |
g | 0.9243 | 0.0757 |
b | 0.6706 | 0.3294 |
g | 0.7342 | 0.2658 |
g | 0.7894 | 0.2106 |
g | 0.8358 | 0.1642 |
g | 0.8735 | 0.1265 |
g | 0.9036 | 0.0964 |
g | 0.9271 | 0.0729 |
g | 0.9452 | 0.0548 |
b | 0.7421 | 0.2579 |
g | 0.7961 | 0.2039 |
b | 0.3942 | 0.6058 |