2  Probability

2.1 Introduction

Bayesian thinking is based on the subjective viewpoint of probability. In this chapter, we will talk about the different ways of thinking about probability.

2.2 Measuring Uncertainty

We live in a world of uncertain events and some events are more likely to occur than other events. Words such as “likely”, “probable”, “possible”, “rare”, and “maybe” are used to describe this uncertainty. It is natural to use numbers that we call probabilities to quantify this uncertainty. The probability of an event \(A\), denoted \(P(A)\), is a number between 0 and 1 assigned to the event \(A\) where a larger number indicates that the event is more likely to occur.

Some uncertain events already have numbers assigned to them. In games of chance where dice are rolled or cards are dealt from a well-shuffled deck, outcomes have particular probabilities. For example, the chance of rolling two dice equal to double-sixes is 1/36 and the chance of dealing a four of diamonds in a regular deck is 1/52. In actuarial tables, there are assigned probabilities that a person’s life span will be a particular length based on one’s gender and age. These actuarial tables are used by insurance companies to write up a life insurance policy and decide on the cost of the policy to the customer.

There are two ways of viewing probabilities that allow us to assign probabilities in games of chance and actuarial tables. The classical or “equally-likely” probability view assumes that one can represent the outcomes of a random experiment in such a way such that the outcomes are equally likely. Then each outcome is assigned a probability equal to one divided by the total number of outcomes. In the dice example, there are 36 equally likely ways of representing the possible rolls of the dice, and the probability of one outcome (two sixes) is equal 1/36. In the card example, there are 52 possible draws in a deck of cards, and if the deck is well-shuffled, each outcome such as “four of diamonds” is assigned the probability of 1/52.

A second way of thinking about probabilities is based on long-run relative frequencies. Suppose you are able to repeat a random experiment many times under similar conditions. Then the probability of an event is approximated by its relative frequency in the large number of trials. This viewpoint can be applied in games of chance. For example, the probability that the sum of two dice is equal to 7 can be approximated by the relative frequency of 7 in many rolls of the two dice. This definition can also be used for actuarial tables. The chance that a male of age 70 will survive ten years can be approximated by the relative frequency of 70-year old males who survive ten years.

Is it possible use the relative frequency viewpoint to measure uncertainty for all random events? Lindley makes a distinction between two types of events. Statistical events are the events that can be repeated under similar conditions and non-statistical events are events that are essentially unique and can not be repeated. Games of chance are statistical events, while one-time events such as “Jones committed the murder” or “a Republican will be the next American president” are non-statistical events. One can use the relative frequency perspective to measure the probability of statistical events, but this viewpoint is clearly inappropriate in measuring the chance of non-statistical events. Also there are issues in applying the relative frequency viewpoint. Sometimes it is not clear how to repeat a random experiment under similar circumstances. For example, suppose one observes three flips of a coin with the first head on the third flip. How does one repeat this experiment. Does one consider sets of three flips, or instead consider flips that end with the first head?

2.3 The Subjective Viewpoint

There is a third viewpoint of probability, the subjective viewpoint, that is the basis for Bayesian thinking. We start with a proposition which is any statement that can be true or false. For example, consider the proposition “it will rain tomorrow.” A person’s belief in the truth in this proposition can vary; the person may believe it is certainly false or she may believe it is certainly true. A probability represents a number attached to the proposition “it will rain tomorrow” that reflects this belief. A probability of 1 means that the person believes the proposition is true, and a probability of 0 means that the person believes with certainty that the proposition is false. A probability of 0.5 means that the person believes that the propositions “rain tomorrow” and “no rain tomorrow” are equally likely.

In the above definition, it should be noted that we are assigning numerical measures to propositions, which are more general than events. A proposition is any statement that is either true or false. Both statistical events and non-statistical events are examples of propositions. Second, an assigned probability is personal in that it reflects one person’s belief about the truth of the proposition. Different people may assign different probabilities to a given proposition. Certainly if we consider the proposition “Susie is receiving a final grade of A in her statistics class”, Susie and her instructor may have different beliefs about the truth in this proposition. Also a person’s probabilities about a proposition may change over time. As she obtains more information, her belief and therefore her probabilities can change. In our example, as Suzie receives exam grades, she will have different information and therefore possibly different beliefs in the proposition that she will get a final grade of A.

To summarize, from the subjective viewpoint, a probability is a numerical measure of the degree of belief by a person in a proposition based on the person’s current information. If \(E\) is a proposition and \(H\) is the current information or history of the individual, then we represent a probability by the notation \(P(E | H)\).

2.4 Measuring Subjective Probabilities

At this point, all we know about a subjective probability \(P(E|H)\) is that it falls between the values of 0 and 1 and larger values correspond to stronger beliefs in the truth in the proposition. Since subjective probabilities are generally difficult to assess, it is appropriate to describe some measurement methods.

The direct measurement approach takes a measurement of an object by comparing it with a collection of reference objects. To learn about the length of a piece of string, a direct measurement method uses a ruler, and a direct measurement way of learning about the weight of an object places it on a scale. To directly measure probabilities, we need to consider a set of propositions where the probabilities are known.

Consider the following “balls in bag” experiment. We have a bag with \(r\) red and \(w\) white balls and one ball is chosen from the bag. Let \(R(r, w)\) denote the proposition that the chosen ball is red. We assume that the balls are all identical in appearance except color, the bag is mixed well, and one makes the draw blindfold. Then we would agree that the probability of \(R(r, w)\) is equal to the fraction \(r/(r+w)\). Consider the set of reference propositions \(R(0, 1), R(1, 0), R(1, 1), ..."\). The reference probabilities \(r/(r+w)\) cover all rational values between 0 and 1.

To use these reference propositions to measure your probability \(P(E | H)\), you compare the proposition \(E\) with the reference propositions {\(R(r, w)\)}. Suppose you have a stronger belief in the truth of \(E\) than \(R(r, w)\). This implies that your probability \(P(E | H)\) exceeds the fraction \(r/(r+w)\). By making a sequence of comparisons of \(E\) with \(R(r, w)\) for different choices of \((r, w)\), one can in theory get an accurate assessment of your probability \(P(E | H)\).

I am a Phillies fan and I wish to assess my probability of the event \(A =\)“the Phillies will be in the World Series” this season. To using this direct approach of assessment, I make the following comparisons.

  1. I first compare “Phillies in the World Series” with the event \(R(1, 1)\), choosing a red out of a bowl with one red ball and one white ball. I believe \(R(1, 1)\) is more likely, so my probability of \(A\), \(P(A) < 1/2\).

  2. Next, I compare the event \(A\) with the event \(R(1, 3)\), choosing a red out of a bowl with one red and three white balls. I believe that \(R(1, 3)\) is more likely, so \(P(A) < 1/4\).

  3. I continue with comparing \(A\) with \(R(1, 7)\), choosing a red out of a bowl with one red and seven white balls. I believe that the Phillies in the World Series is more likely. So \(P(A) > 1/8\) and combining my comparisons, I now know that \(P(A)\) is in the interval \((1/8, 1/4)\).

  4. I continue with these assessments until it is difficult to make further comparisons. I will have a small interval that I believes contains my probability of the event.

Other measurements are taken in an indirect manner. For example, one older style of thermometer is constructed by the use of mercury in a glass tube. Heat applied to the glass causes the mercury to expand and one measures the temperature by reading the location of the mercury on a printed numerical scale. For this type of thermometer, one indirectly measures temperature by use of the expansion and contraction of mercury. In a similar fashion, one can measure probabilities by means of bets that are indirectly related to probabilities.

A bet consists of a proposition that determines the outcome of the bet, odds that are offered by the bookmaker, and the amount of money that you are willing to stake. If you decide to stake $\(s\) on the proposition \(E\) at odds \(z\), this means that

  • if \(E\) is false, you will give the bookmaker $\(s\)
  • if \(E\) is true, the bookmaker gives you $ \(z s\)

The stake is the amount that you can lose if the proposition \(E\) is false and the odds is the ratio of the amount you can win if \(E\) is true to the amount you lose if \(E\) is false.

Using this indirect method, one measures your probability \(P(E | H)\) by means of bets on \(E\) between you and a hypothetical bookmaker. One can fix a value of the stake, say \(s = \$5\) and bet on \(E\) using different values of the odds \(z\). You decide which bets to accept and reject.

How can one obtain one’s probability on the basis of these bets? First, note that you will accept any bet if the odds \(z\) is sufficiently large. But as the odds \(z\) is decreased, one will be less inclined to accept the bet, and for small values of odds, you will reject the bet. There will be one odds value \(z_0\) where you will accept bets for odds \(z > z_0\), and reject bets for \(z , z_0\). The value \(z_0\) is often called the {} of the proposition \(E\) based on your current information, denoted by \(O(E | H)\). We transform odds to a probability by the expression \[ P(E | H) = \frac{1}{1+O(E|H)}. \]

Let us illustrate this indirect method of measuring probability for assessing my probability of the event \(A\) that the Phillies are in the World Series. I decide on using the stake $5 and consider the following bets:

  1. BET 1: At odds \(z = 1\), I will either lose $5 if \(A\) is false or win $5 if \(A\) is true.
  2. BET 2: At odds \(z = 2\), I will either lose $5 if \(A\) is false or win $10 if \(A\) is true.
  3. BET 3: At odds \(z = 4\), I will either lose $5 if \(A\) is false or win $20 if \(A\) is true.
  4. BET 4: At odds \(z = 10\), I will either lose $5 if \(A\) is false or win $50 if \(A\) is true.

I will definitely not accept Bets 1 or 2 (the winning amounts are too small), and would accept Bet 4 which seems to have a generous odds. After some thought, suppose I am satisfied with a bet between Bet 3 and Bet 4 where the odds are \(z = 7\). Then my fair odds would be \(O(A | H) = 7\) and my probability of “Phillies in the World Series” would be \[ P(A | H) = \frac{1}{1+7} = 0.125. \]

2.5 True and Measured Probabilities

We have discussed two methods, a direct method and an indirect method, for measuring the subjective probability of a proposition. Generally people will have trouble applying these methods since they have little experience in specifying probabilities. So in a typical application, one will not be able to specify his/her probability with high accuracy. Here it is helpful to distinguish a person’s true probability and her measured probability. A person’s true probability is the value she would obtain if she were able to make very fine comparisons in the likelihoods of events and had an infinite amount of time to make the assessment. But in real life, the person is unable to make fine comparisons and will spend only a finite amount of time on this task. So the specified probability \(P(E | H)\) is simply a measured estimate at the true probability. Since our measuring methods, such as the balls in bag experiment, are relatively crude, there can be significant measurement error in the specification of this probability.