Reasoning with causality: An example
This is post 3 in a sequence exploring formalisations of causality entitled “Reasoning with causality”
The previous post was “A causal calculus: processing causal information”
The last two posts have introduced a graphical method and a calculus for discussing causality. This post will demonstrate how these can be used for causal reasoning by following one of Pearl’s examples – an exploration of the causal relationship between smoking and lung cancer in a deliberately simplified world as follows: Smoking causes lung cancer via the intermediary of building up tar deposits in the lungs. There is also a genetic feature that both increases the chance of developing cancer and increases the chance of one smoking.
The first thing we’re going to need to think about this is the relevant causal graph which we can see showing both of these two causes of cancer. The question we now need to determine is the strength of these links – basically, what is the probability of getting cancer due to do(smoking)?:
So we’re trying to discover the probability of cancer giving do(smoking), or:
The rest of the proof will use the rules of do() introduced in the last post to remove the do() statement so that the problem can be resolved with the normal rules of probability (the first few steps will be explained explicitly but, if you want to follow after that you should have the tools to work out the steps for yourself).
Step 1
The first step is to state that the above equation is equivalent to:
The justification for this is due to the axioms of probability, as follows:
Any probability P(a) can instead be thought of as a sum of the probabilities of exhaustive, mutually exclusive events. So, for example: A school has four classes of 25 students. All 100 students are currently gathered in a hall for a meeting. They are mixed up at random. What is the probability that a student selected at random is the tallest in their class?
You can reason as follows: There are four tallest students (because there are four classes) and 100 students so the probability is 4/100. Or you could think in terms of classes (which are mutually exclusive – no students are in two classes – and exhaustive – all students are in a class). In each class there are 25 students so you can say: What’s the probability of this student being in class 1: ¼ and what’s they’re probability of being the tallest in their class: 1/25. So what’s the probability that the student is the tallest in class 1: 1/100. You could then find the same values for the other three classes and sum these values together to once again get 4/100.
Similarly, the probability of cancer given do(smoking) is equivalent to the sum, for all possible tar levels, of all the probability of cancer given do(smoking) and given a certain level of tar in the lungs multiplied by the probability of that level of tar being in the lungs given smoking (if that’s not clear, think of it in terms of the school student example).
Step 2
The next step is to show that this is equal to:
This is simply a use of Pearl’s second rule (you may want to have the rules post open for reference). However, Pearl’s rules can only be used if a certain precondition is met. You may have noticed that each of the preconditions in the previous post ended with something like:
This notation tells you what graph the precondition must be met in. So rather than seeing if it applies to the graph we drew above, we see if it applies to a subgraph of this defined such that if there’s a hat above the letter, all edges going into the letter are removed and, if there’s a hat below, all edges going out of the letter are removed. So in the above example all arrows going into X and out of Z are removed. Which in our example above means the preconditions must be met in the following graph:
The precondition is that the rule can only be used if c and t are conditionally independent given do(s) (which is to say that if you know do(s), knowing t as well won’t change your probability for c). In the above graph this is plainly true as there is no causal link from t to c.
Remaining steps
The rest of the proof proceeds as follows, ending with a probability equation that does not contain any do() operators. Each step of this proof takes place in a similar way to those above. One of Pearl’s rules for do() are applied after the appropriate subgraph is determined to meet the required criteria:
Conclusion
This is where the graphical and calculus based approaches to causality come together and begin to allow us to reason with causal information that is given to us. In Causality, Pearl’s proof is much shorter and to the point. Here I’ve tried to provide more detail on the first few steps to make the proof clear but for those who worry that they’re drowning in detail, read Pearl’s presentation and see if you find it any more clear.
So far I have been focusing on how causal information can be manipulated once you have it using the do() calculus and Pearl’s graphical methods. In the next lot of posts, I will be exploring more about how Bayesian Networks can be used as causal models.
The next post in the sequence is An introduction to Bayesian networks in causal modeling

September 2, 2010 at 7:40 amA causal calculus: Processing causal information « Formalised Thinking

September 2, 2010 at 7:36 amPearl’s formalisation of causality: Sequence index « Formalised Thinking