Conditional probabilities and a debate about God (Part I)

Via Cosmic Variance, which I occasionally read, came the news two weeks ago of a public debate at the University of North Carolina between theologian Wiliam Lane Craig and cosmologist Lawrence M. Krauss, an atheist. You can watch it in its entirety on Youtube. The debate touches on issues of cosmology, probability, logic and philosophy. I do not feel particularly qualified to comment on most of the cosmological arguments – origins of the universe etc. – brought up in the debate (although looking into this would certainly be interesting). That would require reading up on a lot of highly nontrivial technical stuff I have so far had little or no training in. Consequently, this blog post would never be finished. ;)

But I will discuss some of the logical/philosophical/probabilistic aspects. This will involve discussing things that aren’t really my field of expertise, either, but since both participants in our exchange also venture out of theirs, that won’t stop me. Being an atheist/agnostic (depending on where you draw the line), I obviously believe that a lot of Craig’s reasoning is wrong and fallacious. Unfortunately, I also believe that this wasn’t pointed out with sufficient clarity in the debate (which the majority of the audience thought Craig won, and I probably have to agree). So let me try.

I decided to subdivide my commentary into two parts, only the second of which will really adress the actual points of controversy between Krauss and Craig. The reason is that there is a mathematical issue that played a large role in Craig’s presentation, and which I thought I should first give readers of this blog some background information on: Conditional probabilities. However, if you have a solid enough background in high school mathematics, you should already know all this stuff and can jump straight to the second part. The same is true if you are willing to ignore anything I say about conditional probabilities there and read it anyway.

Conditional probabilities and the definition of evidence

Craig starts his opening presentation by defining the word “evidence”. To him, a fact F is evidence of a hypothesis H (such as “God exists”), if the conditional probability of H given this fact his higher than the probability of H without this fact. Now, let me explain (very informally, without worrying about things like the exact definition of a probability measure etc.) what conditional probabilities are and how to work with them:

When you toss a coin, you can assume that the chances of getting head or tails are “fifty-fifty”, i.e., heads is just as likely as tails. If you now get someone else to toss another coin at the same time, the results of your tosses obviously do not influence each other. Therefore, if you want to know the probability that both of you get tails, you can argue like this: In 50\% of all cases, I will get tails. Because the probability that my partner gets the same result is independent of mine, the chance of him getting tails in the 50\% of all cases where I get tails is still 50\%. So in 50\% of 50\% of all cases, we will have the desired outcome. That means its probability is 25\%. Remark here that calculating 50\% of 50\% is the same as multiplying 50\% by 50\%.

Of course, there is nothing special about the 50\% probability. It would work just the same with some strange coins where one has a 10\% tails and 90\% heads chance, while its 60 to 40 for the other. What is important is that the events do not influence each other, nor is there some hidden factor in the backgroung influencing both. They are “independent”. This isn’t actually the formal definition probability theorists use for independence of two random events. Instead, two events A and B would be called independent precisely if they have the property that the chance of both occuring P(A \; and \; B) equals the product of the probability of A with the probability of B, P(A) \cdot P(B). But, as we saw, the events we called independent have this very property, which is the motivation why this formal definition is introduced in the first place.

On the other hand, there are obviously events which are not independent of each other: If lightning strikes in a certain area (“chance event A”), the probability that someone’s house there will be set on fire (“chance event B”) dramatically increases. If we assume that A will occur on a given day with a likelihood of 2\% in the area we are talking about (it doesn’t matter here if the values are realistic), and, e.g. from some actuarial table, we know the likelihood of fire on a day with a lightning strike (of event A and B occuring) is 0,4\%. Now, imagine the situation where someone tells you lightning has struck, but doesn’t know if any fire accident happened. In other words, rather than looking for the probability of A and B occuring, you are looking for the probability of B if the knowledge that A happened is already given . This is the conditional probability of B given A, which is generally written as P(B|A).

To calculate it, consider this: Out of 1000 days, you have 20 (2 percent) where lightning will strike in the neighbourhood we are investigating. And you have 4 days (0,4 percent), where lightning will strike and someone’s house will burn. This means that on four out of 20, or 20\% of all days where we have a lightning strike, there will also be a fire. This is the probability we are looking for. The general formula hence is:

P(B|A)=\dfrac{P(A \; and \; B)}{P(A)}

Please note that we just looked at the statistics of how often A and B both happen. This is not quite the same as the statistics of when A will actually be the cause for B. We do know, however, that in case of a lightning strike, the fire risk increases, because you have the possibility that it will cause a fire. But, of course, any other way a house could catch fire is just as possible in case of a lightning strike as it would have been without. So both of these cases have to enter the probability P(B|A). Also, that one event might cause the other is not the only possibility for real-world events to not be considered independent in a stochastic model. There might also be some common cause lurking in the background. If someone in your street dies of bubonic plague, you have an increased risk of dying from it, too. But that is not because his death will directly cause yours, it is because you have been exposed to him/the same sources of infection as him before his death.

From our formula, we of course find that P(A \; and \; B)=P(B|A)\cdot P(A). If A and B now are independent events, we also have P(A \; and \; B)=P(B)\cdot P(A). In this case, it follows P(B)\cdot P(A)=P(A \; and \; B)=P(B|A)\cdot P(A), and this means we must have P(B|A)=P(B). In other words, for independent events (for our ultimate purposes, perhaps it would be better to call them “facts”, but I’ll stick with “events” for now), the probability of the one event B given the other event A is just the same as the probability of B without any information about event A or anything else. In other words again, the probability of B doesn’t give a damn about A. Observe here that the reasons that led us to the formula for conditional probability (and hence the formula) is valid in any case, wether our events are independent or dependent, wether one of them could cause the other or there is something lurking in the background that might cause both.

At this point, a word of warning is due: In general, P(A|B) is not the same as P(B|A). For example, if someone leaked us the information that the next Nobel literature prize will go to an American, it would be extremely unlikely for him to be Dan Brown (no offense!). P(Nobel \; prize \; for \; Dan Brown | Nobel \; prize \; for \; an American) is a value I believe one can safely assume to be way below 1\%. On the other hand, if the information leaked was that the next Nobel literature prize was to go to Dan Brown, the probability of the next literature Nobelist being American would be 100\%, because Dan Brown is American. P(Nobel \; prize \; for \; an \; American | Nobel \; prize \; for \; Dan \; Brown)=100\% \\ =1.

So, how are these quantities related? Let us return to the lightning/fire problem we discussed above. One could also ask the other way around: If we know that there was a fire in the neighbourhood we are considering, how likely is it that lightning struck there before? We already know that P(B|A)=20\%, and now we are looking for P(A|B). Assume that, without a lightning strike, the fire risk on a given day, i.e. P(fire|no \; \; lightning), is 1 \%. Then, out of 10,000 days, there are 200 (still 2\%) with a lightning strike, and the probability of fire given lightning is still 20\%, so that on 40 of these days, we have lightning and fire. On all other 160 lightning days, there is no fire. Now consider the 9800 days without lightning. then 98 of them (one in hundred) see someone’s house burning. That means there are 98+40=138 days in total where we have the fire event occuring. This means that the probability of this event B without any information on lightning or not is 138/10,000, or 1,38\%.

Note what we (not explicitly, but implicitly) did to obtain this: We effectively multiplied P(B|A) by P(A) to obtain P(A and B) (“…looked at the P(B|A)=20 \% of the P(A)=2 \% of events where, by the very definition of conditional probability, A and B occur..”) and then, multiplying P(B|not A) by P(not A), got P(B \; and \; not \; A) in the same way. Obviously, “A and B happen” and “B happens while A does not” cover all possible cases in which B can happen. Now, out of these 138 cases there are still the 40 cases where we will have lightning and fire, whose relative frequency is still given by P(B|A)\cdot P(A). So the probability we are looking for is 40/138 \approx 29\%. Since we get the (expected) number of occurences (in this case, “days where it could happen”) of an event by multiplying the total number of “attempts” to get it (in this case the total number of days we are considering, or 10,000) by its probability, we get:

P(A|B)\\ \\ =\dfrac{\# \; cases \; where \; A \; and \; B \; occur}{\# cases \; where \; A \; and \; B \; occur \; + \# cases \; where \; B \; occurs, \; but \; A \; does \; not}\\ \\=\dfrac{10,000\cdot P(B|A)\cdot P(A)}{10,000\cdot P(B|A)\cdot P(A)+10,000\cdot P(B|not A)\cdot P(not A)}\\ \\= \dfrac{P(B|A)\cdot P(A)}{P(B|A)\cdot P(A)+P(B|not A)\cdot P(not A)} \\ \\ = \dfrac{P(B|A)\cdot P(A)}{P(B)}

(“#” is short for “number of” here. For the fourth line, remember cancelling down of fractions from middle school.) This result is known as Bayes’ theorem. Let us now look at a few features of this theorem:

a) This can, indeed, be seen as a theorem that tells us how strong B is as a piece of evidence for A, in the sense defined by Craig. In our example, knowing about a fire in our area raised the probability of a lightning strike from the 2\% it would have been without any information to 28\%, so P(A|B) is bigger than P(A).

When does this happen? Well, it happens precisely when \dfrac{P(B|A)}{P(B} is a factor bigger than 1 (in our example, it’s roughly 14.5), i.e., the probability of B given A must be higher than the probability of B without any information about A. (Perhaps, this is not a too surprising result.) If, on the other hand, the probability of B is lower if we know that A is true than without any information on A, P(A|B) is lower than P(A) and B is a piece of evidence against A. If the two probabilities are the same, we have independent events once again and B is just a neutral piece of information. Also, even though the probability of A given B rose to 28 \%, it is still well below 50\%, which has to do with the following point:

b) The result is sensitive both to the probability that B will occur given A and the prior P(A), which is called so because it is the probability of A before we have any information concerning B. This is just the formal version of Krauss’ informal statement that “extraodinary claims require extraordinary evidence”. Indeed, you would probably accept the eyewitness account of a friend as sufficient evidence of a car accident having happened yesterday in your city. You wouldn’t as easily be convinced of the visit of aliens to his foregarden. Why? Because of your prior knowledge that alien encounters are highly unlikely, while traffic accidents happen all the time. (Unless you live in certain rural areas of the United States. ;-) ) Similarly, while knowing about a fire in our example increased the probability of lightning by a factor of 14,5, it still stayed below 50\%, because the 2\%-prior we had was too low.

c) As we saw, we can write P(B) as P(B|A)\cdot P(A)+P(B|not A)\cdot P(not A). We could also go further and distinguish between different subcases of “not A”. In our example, assume that the only other possible reason for a fire is an electrical accident in the household. Then:

P(fire) \\ = P(fire|lightning, \; no \; accident)\cdot P(lightning, \; no \; accident)\\ + P(fire|el. \; accident, \; no \; lightning)\cdot P(el. \; accident, \; no \; lightning) \\ + P(fire | both \; events)\cdot P(both \; events) \\ + P(fire | neither \; event)\cdot P(neither \; event)

The last summand, is, of course, zero, because we assumed that fire can only break out in case at least one of our two possible reasons occurs. Also,

P(fire|lightning \; strikes, \; no \; accident) \cdot P(lightning \; strikes, \; no \; accident) \\+P(fire | both \; events)\cdot P(both \; events) \\= P(fire \; and \; lightning, \; no \; accident)+P(fire \; and \; lightning \; and \; el. \; accident)\\ =P(fire \; and \; lightning)\\ =P(fire | lightning)\cdot P(lightning),

quite obviously. So,

P(fire) \\ =P(fire | lightning)\cdot P(lightning) \\ +P(fire | el. \; accident, \; no \; lightning)\cdot P(el. \; accident, \; no \; lightning).

If we admit a number of alternate explanations – rather than just two – as possible, we can sum the corresponding terms to get P(B) in the same way, as long as we consider all explanations (say explanation 1, 2, 3, 4…) and they are mutually exclusive:

P(B) \\ = P(B|expl \; 1)\cdot P(expl \; 1) \\ +P(B|expl \; 2 \; and \; expl \; 1 \; untrue)\cdot P(expl \; 2 \; and \; expl \; 1 \; untrue) \\ +P(B|expl \; 3 \; and \; expl \; 1 \; and \; 2 \; untrue)\cdot P(expl \; 3 \; and \; expl \; 1 \; and \; 2 \; untrue)\\ +P(B|expl \; 4 \; and \; expl \; 1, \; 2, \; 3 \; untrue)\cdot P(expl \; 4 \; and \; expl \; 1, \; 2, \; 3 \; untrue)\\ +...

(Assuming all previous explanations to be untrue in each summand is what makes sure we really are dealing with mutually exclusive cases, which is crucial for this to hold – just as, above, the events “lightning strike” and “electrical accident” weren’t mutually exclusive, but the events “lightning strike” and “electrical accident, but no lightning strike” were.)

When will P(expl \; 1|B) be bigger than 50\%? Well, this will be the case if \dfrac{P(B|expl \; 1)\cdot P(expl \; 1)}{P(B)} is bigger than that, which means P(B) must not be more than twice as large as P(B|expl \; 1)\cdot P(expl \; 1). By the way we have written P(B) above, this is precisely so when the sum of the P(B|alternate \; explanation)\cdot P(alternate \; explanation)-terms is smaller than P(B|expl \; 1)\cdot P(expl \; 1). In other words, we need to have that the probability that B occurs and explanation 1 is realized must be greater than the probability that B occurs and any other explanation is true. (This shouldn’t be too suprising, either.)

Let me conclude by, once again, stressing that it isn’t necessary for all this to be true that the random events we are considering actually are explanations of B. It is only that it makes sense for our example, and mostly also for the debate I am going to review. But, as already explained, there could be other ways B could be evidence for something in our probabilistic sense.

This lengthy introduction to the Bayesian calculus and conditional probability could seem overly trivial to some people, overly complex to others with less mathematical training, and even both to a third subgroup of readers (OK, I don’t really have any yet, but imagine that I had). But, as I said above, I believed it might be helpful to explain more exactly what is meant when ideas like conditional probabilities or priors are brought up, and also how some of these arguments could be stated in plain English.

In the next part, let us investigate the actual evidence given in the debate.

Comments are closed, but you can leave a trackback: Trackback URL.
Follow

Get every new post delivered to your Inbox.