Theorem of Total Probabilities and Bayes’ Rule

From MM*Stat International

Jump to: navigation, search

Recall the disjoint decomoposition we have introduced earlier in this chapter as a set of events A_{1},A_{2},\ldots,A_{n} satisfying

  • A_{i}\neq\emptyset\quad\left(  i=1,2,\ldots,n\right)
  • A_{i} \cap A_{k}=\emptyset\quad\left(  i\neq k;i,k=1,2,\ldots,n\right)
  • A_{1}\cup A_{2}\cup\ldots\cup A_{n}=S

Theorem of Total Probabilities

A_{1},A_{2},\ldots,A_{n} be a disjoint decomposition. Then, for any event B\subset S with P(B)>0: \begin{align}
P(B)  &  = P\left(  B\cap A_{1}\right)  + P\left(  B\cap A_{2}\right)  +
\ldots+ P\left(  B\cap A_{n}\right) \\
&  = P\left(  B|A_{1}\right)  P\left(  A_{1}\right)  + P\left(  B|A_{2}
\right)  P\left(  A_{2}\right)  + \ldots+ P\left(  B|A_{n}\right)  P\left(
A_{n}\right) \\
&  = \sum_{i=1}^{n}P\left(  B|A_{i}\right)  P\left(  A_{i}\right)\end{align} We have applied the multiplication rule of probability P\left(  A\cap
B\right)  =P\left(  B|A_{i}\right)  P\left(  A_{i}\right)  .

Bayes’ Rule

Let A_{1},A_{2},\ldots,A_{n} be a disjoint decomposition. Then, for any event B\subset S with P(B)>0 and given conditional probabilites P\left(
B|A_{1}\right)  ,P\left(  B|A_{2}\right)  ,\ldots,P\left(  B|A_{n}\right)  : P\left(  A_{j}|B\right)  = \frac{P\left(  B|A_{j}\right)  P\left(
A_{j}\right)  }{\sum_{i=1}^{n}P\left(  B|A_{i}\right)  P\left(  A_{i}\right)
} \quad\forall j=1,\ldots, n The Bayesian approach to statistics interprets the P\left(
A_{j}|B\right)  as posterior probabilities and P\left(
A_{i}\right)  as prior probabilities. This conceptual approach to statistics accounts for prior information in the form of subjective belief rather than defining probabilities as limits of relative frequencies. The Monty Hall problem (named after Monty Hall, television host of the show ”Let’s make a deal”) is based on the following situation: Monty Hall shows his guest three doors A,B and C. The main prize is hidden behind one of them, other doors conceal smaller prizes. For now, let us assume that the main prize is behind the door B. Monty Hall asks the player to choose one door. After the player chooses (let us say door A), one of the doors which does not contain the main prize is opened (let us say door C). The player can now decide whether to continue with his original choice (door A) or if he wants to choose the other closed door (door B). What is the probability that the main prize is behind the originally selected door (A) or behind the other (unopened and not selected) door (B)? This interactive example allows you to play the game with “virtual Monty” and to study the relative frequency of winning the game depending on your strategy. The statistical definition of probability ensures that your question will be answered after a sufficient number of games. Try it!!!

En folnode7 e lev2 2.gif

Solution: Let us define the events A : \ \text{Main is price behind door A} B : \ \text{Main is price behind door B} C : \ \text{Main is price behind door C} a : \ \text{Monty opens door A} b : \ \text{Monty opens door B} c : \ \text{Monty opens door C} Initially, the probability is 1/3 that you have selected the winning door., P(A)=P(B)=P(C)=1/3 These probabilities are valid before Monty opens a door; we can denote them as the a priori probabilities. Let us say, that you choose door A. Monty now opens one of the other doors which does not contain the main price. We distinguish two situations:

  • Situation 1

    If the prize is behind your door (A) then Monty can open either of the remaing two doors (door B or C). Let us assume that his decision is random—this means that both door have probability 1/2.

  • Situation 2

    If the prices is not behind your door, then it has to be behind door B or C and Monty has to open (i.e. he will open with probability 1) the other one.

Let us assume that Monty opens door B. Mathematically, this means \text{Situation 1:}\ P(b|A) = \frac{1}{2} \text{Situation 2:}\ P(b|C) = 1 As a player, you do not know which situation has occurred. When Monty opens the door, you can stick to your original decision or you can change it and open door C. Which decision is better, i.e. which of the doors A or C are more likely to conceal the main prize, if we know that Monty has opened door B? We would like to calculate the probabilities P(A|b) and P(C|b). The a priori probabilities were P(A)=P(C)=\frac{1}{3}. When Monty opens door B, we can calculate the a posteriori probabilities by applying the Bayes rule and the Total Probabilities Theorem: \begin{align}
P(A|b)  &  =\frac{P(b|A)\cdot P(A)}{P(b)}=\frac{\frac{1}{2}\cdot\frac{1}{3}
P(C|b)  &  =\frac{P(b|C)\cdot P(C)}{P(b)}=\frac{1\cdot\frac{1}{3}}{\frac{1}
&\end{align} Changing your decision pays off! Description of the interactive example: In this example you can choose the number of rounds n, the number of six X and the probability P of the event, that you obtain six when you throw “adjusted” dice. From these entries, you obtain a probability that the dice used in the game, was adjusted. Read the following story carefully before you start the interactive example ! The story: Three siblings are playing dice. The youngest one (a boy) gave one die to each of his sisters. They roll the die n times and the one who obtains six the most frequently, wins. The sisters remember that one of the dice is ”loaded’. The probability of obtaining six with this die is 1/3, the probability of other numbers is uniform at 2/15. The first sister rolled the die n times and she has X sixes. The other sister wants to caculate the probability that her die is loaded. This can be done easily. Let us look at the actual number of sixes which can be 0,1,2,\dots
,\ \text{or}\ n. For simplicity, suppose n=3  For a fair die we will write \ W=0, for a loaded die,  W=1.  . All throws are mutually independent and therefore we obtain: P(X=0|W=0)=P(\text{no 6 in the three throws})=5/6\cdot5/6\cdot5/6=0.5787 P(X=1|W=0)=P(\text{just 1 six in the three throws})=1/6\cdot5/6\cdot
5/6\cdot3=0.3472 P(X=2|W=0)=P(\text{exactly 2 sixes in the three throws})=1/6\cdot
1/6\cdot5/6\cdot3=0.0694 P(X=3|W=0)=P(\text{all three throws give 6})=1/6\cdot1/6\cdot1/6=0.0046 . For the same experiment with the loaded die (W=1) we obtain: P(X=0|W=1)=2/3\cdot2/3\cdot2/3=0.2963 P(X=1|W=1)=1/3\cdot2/3\cdot2/3\cdot3=0.4444 P(X=2|W=1)=1/3\cdot1/3\cdot2/3\cdot3=0.2222 P(X=3|W=1)=1/3\cdot1/3\cdot1/3=0.0370 Let us say, that the first sister obtains two sixes from her three throws (X=2). What is the probability that she played with the loaded die? We want to calculate the probability P(W=1|X=2). According to the Bayes rule we have P(W=1|X=2)=\frac{P(X=2|W=1)P(W=1)}{P(X=2|W=0)P(W=0)+P(X=2|W=1)P(W=1)} Using P(W=1)=P(W=0)=1/2 leads in the numerator to 0.2222\cdot0.5=0.1111 and in the denominator 0.0694\cdot0.5+0.2222\cdot0.1458 so that the probability P(W=1|X=2)=0.1111/0.1458=0.762. The interactive example: Choose X (the number of sixes), n (the number of throws) and p (the probability of a six when the die is loaded) and let the computer calculate the probability P(W=1|X). Recommendation: always change the value of only one parameter at a time and observe the influence of the change on the result. Assume 0.5 per cent of the population is infected with a particular virus that leads to acute disease only after a long period of time. A clinical study shows that 99 per cent of the individuals suffering from the symptoms that confirm an infection with the virus test positive. On the other hand, 2 per cent of people not developing the symptoms test positive as well. What is the probability that a person testing positive has the infection? Let us first formalise the problem. Instead of using the set theoretic notation we will now define indicator variables for the two binary variables corresponding to the infection (I) and the test (T): \begin{align}
I  &  =\left\{
1 & \text{if a person is infected}\\
0 & \text{if a person is not infected}
\right. \\
T  &  =\left\{
1 & \text{if the test is positive}\\
0 & \text{if the test is not positive}
\right.\end{align} Using the above we know the following probabilities.

P\left(  I=1\right)  =0.005
P\left(  T=1|I=1\right)  =0.99
P\left(  T=1|I=0 \right)  =0.02

We we would like to calculate P\left(  I=1|T=1\right)  . The definition of conditional probability contains probabilities which not readily available: P\left(  I=1|T=1\right)  =\frac{P\left[  \left(  I=1\right)  \cap\left(  T=1
\right)  \right]  }{P\left(  T=1\right)  }, \text{ for } P\left(  T=1 \right)
>0 To replace the numerator by a known quantity we rearrange P\left(  T=1|I=1\right)  =\frac{P\left[  \left(  I=1\right)  \cap\left(  T=1
\right)  \right]  }{P\left(  I=1\right)  }, \text{ for } P\left(  I=1 \right)
>0 to yield P\left[  \left(  I=1\right)  \cap\left(  T=1 \right)  \right]  =P\left(
T=1|I=1\right)  P\left(  I=1\right) The denominator can be calculated using the theorem of total probabilities: P\left(  T=1\right)  =P\left(  I=1|T=1\right)  P\left(  I=1\right)  + P\left(
T=1|I=1\right)  P\left(  I=0\right)  . We thus get P\left(  I=1|T=1\right)  =\frac{P\left(  T=1|I=1\right)  P\left(  I=1\right)
}{P\left(  I=1|T=1\right)  P\left(  I=1\right)  + P\left(  T=1|I=1\right)
P\left(  I=0\right)  }, Performing the calculation we obtain a somewhat surprising result: P\left(  I=1|T=1\right)  =\frac{0.99 \cdot0.005}{0.99 \cdot0.005 + 0.02
\cdot0.995}=0.199. Thus a ramdomly selected person who tests positive has an 80 per cent chance of not being infected.  But don’t forget about one crucial assumption we have made: the proportion of infected people has to be the same in the population and the sample of tested persons. This may be true for large scale clinical tests. But in practice, there is usually a reason for testing a person, e.g. him/her having been exposed to an infected person. In this example we will apply both the theorem of total probabilities and Bayes’ rule. Wolfram has a wine cellar. Having invited guests for a dinner party, he considers showing off in the most economical fashion. He knows that his guests usually buy their wine at the local supermarket.  So he decides to provide above average food and not to spend too much time choosing the accompanying wine. His stock currently consists of Qualitätswein, Kabinett and Spätlese in the proportions 5:3:2. The proportion of white wine in these classes is 1/5, 1/3 and 1/4, respectively. Being a technocrat not only in pedantically monitorig his stock, he wants to compute the probability for producing a bottle of white wine when randomly picking one. He estimates probabilities by their relative proportions in the stock population:

A_{1}\equiv\left\{  \text{Qualitätswein}\right\}  P\left(  A_{1}
\right)  =0.5
A_{2}\equiv\left\{  \text{Kabinett}\right\}  P\left(  A_{2} \right)
A_{3}\equiv\left\{  \text{Spätlese}\right\}  P\left(  A_{3} \right)

This classification establishes a disjoint decomposition of Wolfram’s wine stock: A_{1}\cup A_{2}\cup A_{3}=S A_{1}\cap A_{2}=\emptyset, A_{1}\cap A_{3}=\emptyset, A_{2}\cap
A_{3}=\emptyset. Let B represent the event of picking a bottle of white wine. Then we know:

P\left(  B|A_{1} \right)  =1/5
P\left(  B|A_{2} \right)  =1/3
P\left(  B|A_{3} \right)  =1/4

Being short of time, Wolfram decides to have the food delivered from a gourmet deli. Now he has spare time to draw a Venn diagram:

En folnode7 e k 1.gif

As A_{1},A_{2} and A_{3} establish a disjoint decomposition, A_{1}\cap
B, A_{2}\cap B and A_{3}\cap B must be disjoint as well. Thus, for B=
\left(  A_{1}\cap B \right)  \cup\left(  A_{2}\cap B\right)  \cup\left(
A_{3}\cap B \right)  \begin{align}
P\left(  B \right)   &  = P \left[  \left(  A_{1}\cap B \right)  \cup\left(
A_{2}\cap B\right)  \cup\left(  A_{3}\cap B \right)  \right] \\
&  = P \left(  A_{1}\cap B \right)  + P\left(  A_{2}\cap B\right)  + P\left(
A_{3}\cap B \right)\end{align} As he doesn’t know the probabilities for the union sets on the right hand side, Wolfram applies the multiplication rule, substituting P\left(
B|A_{i}\right)  P\left(  A_{i}\right)  for P\left(  A_{i}\cap B\right)  : \begin{align}
P\left(  B \right)   &  = P\left(  B|A_{1}\right)  P\left(  A_{1}\right)  +
P\left(  B|A_{2}\right)  P\left(  A_{2}\right)  + P\left(  B|A_{3}\right)
P\left(  A_{3}\right) \\
&  = 1/5 \dot0.5 + 1/3 \cdot0.3 + 1/4 \cdot0.2 = 0.25\end{align} Thus randomly selecting a bottle will result in a white wine with a 25 per cent probility. Given that Wolfram has selected a bottle of white wine, what is the probability that it is Qualitätswein, that is, what is P\left(
A_{1}|B\right)  ? Wolfram wants to apply the definition for conditional probability, P\left(  A_{1}|B \right)  =\frac{P\left(  A_{1}\cap B\right)  }{P\left(
B\right)  } He has already calculated P\left(  B\right)  using the theorem of total probability. But what about the numerator on the right hand side? Wolfram chooses to rearrange the definition for the conditional probability of B, given A_{1} to yield a multiplication rule he can substitute into the numerator: \begin{align}
P\left(  B|A_{1}\right)   &  =\frac{P\left(  A_{1}\cap B\right)  }{P\left(
A_{1}\right)  }\\
\Leftrightarrow P\left(  A_{1}\cap B\right)   &  = P\left(  B|A_{1}\right)
P\left(  A_{1}\right)\end{align} This yields \begin{align}
P\left(  A_{1}|B \right)   &  =\frac{P\left(  A_{1}\cap B\right)  }{P\left(
B\right)  }\\
&  =\frac{P\left(  B|A_{1}\right)  P\left(  A_{1}\right)  }{P\left(
B|A_{1}\right)  P\left(  A_{1}\right)  + P\left(  B|A_{2}\right)  P\left(
A_{2}\right)  + P\left(  B|A_{3}\right)  P\left(  A_{3}\right)  }\\
&  = \frac{P\left(  B|A_{1}\right)  P\left(  A_{1}\right)  }{\sum_{i=1}
^{3}P\left(  B|A_{i}\right)  P\left(  A_{i}\right)  }\\
&  = \frac{0.2 \cdot0.5}{0.25} = 0.4\end{align}