## Bayes Theorem in practice: Kolmogorov Complexity

**This is the second post in a sequence titled “Basic concepts in the formalisation of thought.”**

In the previous post we looked at Bayes Theorem, an equation as below:

Basically, what this does is allow us to take a prior probability and produce a posterior probability given some evidence. So, for example, if we were trying to determine the probability the sun would rise tomorrow, we would start with our prior probability (very high) but then, if we learnt that an alien fleet was determine to destroy it tonight, we could use Bayes Theorem to recalculate this probability based on that new evidence.

In the equation, P(A) represents the prior probability and if we don’t know this figure then it becomes impossible to apply the equation. This post is going to explore Kolmogorov Complexity, one suggestion for how we can reason under these conditions.

**What is Kolmogorov Complexity?**

Kolmogorov Complexity is an attempt to formalise the lose concept of how complex a string is. It is defined as the length of the shortest program that can produce the string. Of course, the length on this program will differ depending on which programming language you use to implement it. However, it can be shown that the difference in this length can be no more than a constant.

To explain, let’s imagine that we originally used the programming language c++ to determine the complexity of a string, S. Now let’s say that instead we use java to do the same job. We could choose to write a c++ interpreter in java and then run the original program. That means that, at most, the complexity of the string in java can be equal to its complexity in c++ plus the length of the interpreter. So, while the choice of language can lead to different Kolmogorov Complexities, this difference is bounded.

**Kolmogorov Complexity and Bayes Theorem**

Ockham’s Razor famously says that, “All else being equal, the simplest theory is the best”, or at least something along those lines and this is the basic idea behind use Kolmogorov Complexity in Bayes Theorem. Our problem with Bayes Theorem was circumstances where we did not know the prior probability of a statement. Ockham’s Razor provides a response – the likelihood of the prior should be dependent on the simplicity of the theory. And Kolmogorov Complexity gives us a way to formalise this.

So the idea, under circumstances where no more information is available, is to define the prior, P(m), as:

2^{-K(m)}

Which has the desired format. Actually, this equation will not ensure probabilities sum to 1 and so, instead, we use what is called prefix-Kolmogorov complexity instead of standard Kolmogorov complexity. This prefix-Kolmogorov Complexity, L(m), is calculated as follows:

L(m) = k(m) + Log_{2}K(,)

And the prior probability is then calculated as:

2^{-L(m)}

In this case, the probabilities do add to one.

**Next post**

The next sequence of posts will explore causal discovery – that is, finding causal information from data.