## Categories and types of intervention

**This is post8 in a sequence exploring formalisations of causality entitled “Reasoning with causality”. **

This post will summarise the paper Varieties of Causal Intervention and, by doing so, will explore what a causal intervention is. To start with, imagines a Bayes Net representing a causal process. So let’s say that cleaning your teeth and genes both effect the chances of dental decay but that the same genes also influence the chance that you’ll do your teeth:

Now image there’s a government policy being considered whereby police men would enforce tooth cleaning. To analyse the affects of this we would intervene on “Clean”, which we can think of as setting the value of the variable and ignoring the influence of any parents, and we would then observe the effects of the intervention on decay.

**Intervention vs Observation**

This concept of intervention is different to observation. Imagine, for example, that we observed someone cleaning their teeth. This may have a different effect to intervene to make them clean their teeth as it suggests they’re more likely to have certain genes and these genes also make it less likely that teeth will decay. Thus, in this case, the observation of tooth cleaning would have a stronger causal effect on decay than an intervention.

The important point is that the two concepts are different because intervention surgically removes the context of any parent nodes.

**Make interventions complex**

The simple view of interventions, expressed in most of the literature on the topic, is that they are deterministic, always achieve a desired effect and affect only one variable. However, interventions in the real world often fail to match these simplified assumptions. For example, an intervention to pressure someone to clean their teeth might fail. Or an intervention might affect multiple variables, rather than just one. And finally, an intervention may be indeterminstic and fail to set a variable to a specific value.

To model this, rather than simple changing the value of a variable in the system (setting clean to true, in the above example), the paper suggests introducing a new parent node for the variable which is introduced to change the variable to some particular target distribution and which is outside of the system. See below:

This intervention node will be binary (yes/no) but its interaction with other nodes is open so that the particular target distribution may not be achieved. Note that intervention variables can be parents of more than their target variable, allowing side effects to be modelled.

**Types of intervention**

An intervention then leads to a new probability distribution across its states in the targeted variable. The types of interventions possible are:

- An independent intervention where the results of the intervention do not depend on other parents of the variable. This changes the probability distribution of the target to that intended by the intervention. A dependent intervention would calculate the new probability distribution based on both this intended distribution and also based on the other parents. So imagine a gene therapy intervention that fixes that effects of bad genes on not cleaning teeth. To someone with bad genes this will decrease the chance of decay. For someone who already has the good genes, this will have no effect. This would be a dependent intervention.
- A deterministic intervention aims to achieve a specific effect. Say to force people to clean their teeth. That’s to say, it aims to set a variable to a specific value. It may still be complicated to model if the intervention is dependent. A stochastic intervention aims to leave the target variable with a distribution over more than one state.

**Conclusion**

Representing interventions as the introduction of a new parent node rather than as a surgical change of the value of a variable allows a wider range of types of intervention to be modelled.

## Probabilistic causality

**This is post7 in a sequence exploring formalisations of causality entitled “Reasoning with causality”. **

Many of the posts on this blog in relation to causality have a hidden assumption: Namely, that causality is inherently probabilistic (though many of the posts are still just as relevant whether this is accepted or not). However, this goes against the mainstream view. This post will summarise part of the paper Varieties of Causal Intervention which argues against the mainstream.

**Deterministic causality**

Judea Pearl, one of the principle researchers in the area of causality, has argued that causality should be interpreted deterministically for the following reasons:

- The deterministic interpretation is the more intuitive one.
- Deterministic interpretations are more general as any indeterministic interpretation can be modelled as a deterministic one.
- A deterministic causality is needed to make sense of counterfactuals and causal explanation.

This post is going to ignore the first point (intuitive doesn’t mean right) and will explore possible responses to the other two.

**Deterministic causality as the more general theory**

Imagine a system where both A and B have a causal influence on C. We can model that as the following equation:

C = dA + eB + U

This says that C is influenced to degree d by A and degree e by B. It also says that C isn’t entirely determined by A and B but there is some degree of variation. By adding U in, a case that was originally indeterministic has become deterministic. However, that doesn’t mean the causal system being modelled is in fact deterministic unless U is a part of the system. Given that, in practice, U is normally defined as that which is left over, claiming this as part of the system seems to be reasonable only if we presume from the start that the system is deterministic.

Given that indeterminstic worlds are consistent, it seems to be an a posteriori question to determine whether causality should be deterministic or indeterministic and hence such an a priori assumption seems unwarranted.

Even if you don’t buy all that, there’s a further point: This process can be applied two ways. This means that any indeterminstic system can be modelled as a deterministic one but any deterministic one can also be modelled as an indeterministic one.

Neither way is more general.

**Indeterministic causality and causal explanation**

In a previous post I discussed an approach by the same author to type and token causality that claims to present an indeterministic account of causal explanation. As such, the third of the problems listed above seems to be solved.

**The next post is Categories and types of intervention**

## Is a causal interpretation of Bayes Nets fundamental?

**This is post 5 in a sequence exploring formalisations of causality entitled “Reasoning with causality”. This post continues to summarise the paper “Causal Reasoning with Causal Models”.
**

**The previous post in the sequence is An introduction to Bayesian networks in causal modeling**

Is a causal interpretation of Bayes Nets fundamental in some way or is it simply accidental. To what extent can Bayes Net be considered to be suitable causal models?

**Bayes Nets as representations of probability distributions**

A common argument that the causal interpretation of Bayes Nets isn’t fundamental is that Bayes Nets are designed to represent probability distributions. And given any Bayes Net with a causal interpretation, another one can be generated that represents the same probability distribution but does not have the causal interpretation. On the surface then, the Bayes Net which is a causal model seems no more fundamental than the other.

Chickering’s Arc Reversal Rule can be used to support the above assertion. This rule basically just reverses the direction of all the arrows. A technical addition to how this works means that the rule can introduce arrows but not remove them. From this, there seems to be an obvious way that Bayes Nets do fundamentally represent causality: The causal model is the one with least arrows that still manages to capture the relevant probability distribution.

Unfortunately, this doesn’t work.

**Why the simplest Bayes Net isn’t the causal model**

There are circumstances under which the causal model is not the simplest Bayes Net that captures the probabalistic dependencies. Imagine the following situation: Sunscreen decreases instances of cancer but increases the time people spend in the sun (because they feel safer) which then increases the chance of cancer (modelled as below):

Now imagine that the two effects perfectly balanced so that sunscreen made no difference as to whether you got sunscreen. The simplest model that would capture the related probabality distribution is far simpler than that above. It looks like this:

The only problem is, this doesn’t represent the causal system. So the causal model cannot be the simplest one which captures the probability distribution.

**Augmented Causal Simplicity**

Which leads to a more complicated conjecture: A causal model is the one with the least arrows that still manages to capture the relevant fully augmented probability distribution. In a basic sense, the fully augmented probability distribution simply means captures the probabilities regardless of how we intervene in the model. So say we intervene in the above model to set the amount of time spent in the sun. In the first graph, this means sunscreen no longer changes the amount of time spent in the sun but it does change the chance of getting cancer. In the second model, changing the time spent in the sum will not capture this probabalistic relationship.

So by demanding that a model capture the probability distribution under all possible interventions, the causal model can still be said to be the simplest that captures the distribution. This then implies that Bayes Nets representation of causality are fundamental.

**The next post in the sequence is Modeling type and token causal relevance**

## An introduction to Bayesian networks in causal modeling

**This is post 4 in a sequence exploring formalisations of causality entitled “Reasoning with causality”**

**The previous post was Reasoning with causality: An example**

The remainder of this sequence is going to depart from the path previously indicated (continuing to read Pearl’s *Causality*) and will instead explore the use of Bayesian networks in causal modeling (in doing so, it will also discuss probabalistic vs determinstic causality) by summarising a technical report titled *Causal Reasoning with Causal Models* (Which should link to, http://www.csse.monash.edu.au/~korb/pubs/techrept.pdf). This first post will introduce the use of Bayesian networks in causal modeling.

**What are Bayesian Networks**

A Bayesian Network is a directed acyclical graph (DAG) and their conditional probabilities. What does this mean? Well take the graph (series of nodes connected by edges) below:

1.) Directed simply means it has arrows pointing out from each node.

2.) Acylical means that if you follow the arrows of any path you will never return to your starting point.

So now we know what the DAG aspect of the definition above means. How about the fact that it captures their conditional probabilities. While that simply means that the Bayesian Network (or Bayes Net) also captures the way the variables are conditionally dependent on one another. Take the central node (“grass wet”). Let’s define the conditional probabilities for this node in the form of a table:

Sprinkler | Rain | Grass wet | Grass dry | |

T | T | 1 | 0 | |

T | F | 0.7 | 0.3 | |

F | T | 0.9 | 0.1 | |

F | F | 0 | 1 |

So this says that if it rained and the sprinkler was on last night, the grass will definitely be wet. If just the sprinkler was on, there’s an 0.3 chance that the grass has dried by now and so on.

So a Bayes Net is a DAG combined with the conditional probabilities that hold between the nodes. Bayes nets are used because they make many problems more computationally cheap (if the list of relevant conditional probabilities is small enough).

**Bayesian Networks and conditional dependencies**

If there are no arrows between nodes in a Bayes Net then this means they must be probabalistically independent. Which is to say:

P(A | B) = P(A)

Conditional independence is an extension of this to situations where a third variable induces independence. So, in the above graph, rain and the paper being wet are conditionally independent because they are “screened off” by the grass being wet.

There are two properties related to this:

1.) A Bayes Net is said to have the Markov Property if all of the conditional independences in the graph are true of the system being modelled (the Markov Property simply means that the future state of a system depends only on the present state and not on the past ones. This is clearly related to conditional independence). It can also be called an independence-map or I-map of the system.

2.) If all of the dependencies in the Bayes Net are true in the system then it’s said to be faithful (or a D-map, Dependence Map) to the system.

If a Bayes Net has both of these properties then it’s said to be a perfect map. Generally in Bayes Nets, we want the I-map to be minimal (such that if we removed any nodes, it would no longer be an I-map).

**Bayesian Networks and Causality**

A Bayes Net becomes a causal model if each of the arrows in the graph can be given a causal interpretation. Each arrow then represents direct causality. Indirect causality is based on paths that always follow arrows. So in the above graph, rain is indirectly related to the paper being wet via the path (rain -> grass wet -> paper wet). Sprinkler being on is not indirectly causally related to rain because the path from sprinkler to rain does not always follow the direction of the arrows.

A path from X to Y is blocked by Z if X and Y are conditionally independent given Z (at least under some basic assumptions). If the graph has the Markov Property than the equivalent procedure to blocking is d-seperation which is best explained as follows. Given two variables, A and B there are four ways that a path can go from A to B through C in an undirected graph:

1.) They can go via a chain A->C->B. In this case A and B are d-seperated given a set of nodes Z if C is in Z (if you know C, knowing A won’t tell you anything more about B).

2.) They can go via a chain B->C->A (reasoning as above)

3.) They can both be caused by C. In this case, A and B are d-seperated given Z is C in in Z (once again, knowing C means that knowing A or B tells you nothing more about the other variable).

4.) They can both cause C. In this case if C isn’t in Z then there is no dependence between A and B by default. They may both cause C but knowing one doesn’t tell you anything about the other. On the other hand, if C is in Z then A and B are causally related. Look at the graph earlier – here if we know whether the grass is wet then there is a conditional relationship between rain and sprinkler – namely that if it didn’t rain the sprinkler must have been on and vice versa. So A and B are d-desperated given Z if C is not in Z.

All up, A and B are d-seperated given Z if all paths from A to B meet one of the above criteria.

**Conclusions**

This post has explored Bayesian Networks and how these can be used as causal models. In the following posts it will explore objections to the use of Bayes Nets to represent causality, the debate over probablistic vs deterministc interpretations of causality and the difference between type and token causality.

**The next post in the sequence is **

**Is a causal interpretation of Bayes Nets fundamental?**