Friday, April 3, 2015

Axing Occam's Razor Part 1 (Overview of complaints)

Occam's razor as it is typically said in words is "The simplest explanation is always the best." I don't know of anyone who considers that version of Occam's razor to be sufficiently precise to be treated as a theory of epistemology. A reasonably common epistemology, notably popularized by the rationalist movement though it has predated that movement by quite a while, is the computational information-theoretic version of Occam's razor that says that theories should be assigned prior probabilities according to their minimum length as measured in bits. Typically, people don't specify anything beyond this. The minimum length in bits of any program in one programming language is after all somewhat related to its minimum length in any other programming language. You can pretty easily show that every pair of Turing complete languages can completely describe the operations of the other, and from there, the proof that for every pair of languages (P,Q) and for every integer N, there exists an integer M such that any program that can be expressed in N or fewer bits in P can be expressed in M or fewer bits in Q. This constraint doesn't puts a very week bound on the relationship between languages. You can also pretty easily prove that for every state S, there exists a programming language P(S) such that S can be expressed in only one bit in P(S). (At this point P(S) is a contrived language specifically designed to permit this hack, so I think it is only fair to reference S in the description of the language P(S).

People who are serious about providing  computational information theoretic version of Occam's razor must specify (at least vaguely) what they believe the ideal programming language should be like.

I'm not a fan of Occam's razor, so the choice of language I would recommend is itself not consistent with Occam's razor. I would consider anything that wasn't specifically engineered to minimize the length of all of its most validated theories a very weak candidate. (This would have to be a language that revises itself in response to evidence, and has a system of pointers that permits frequently referenced concepts to be reused with ease; keeping track of what language you ought to be using would be computationally intractible because they are reflexive in complicated ways... but that's beyond the scope of the current conversation. I will discuss it later.) Most people pick something far more arbitrary. Actually, most people avoid picking anything at all, which is worse than picking something arbitrary.

But of the people who I have read who do put some effort into describing which programming language they would use as the basis of their version of Occam's razor, the most popular choice tends to be Binary Lambda calculus. Actually, if you are truly a proponent of Occam's razor this is pretty much the right choice, since it is probably very close to the simplest possible programming language that can possibly be devised. It autocompiles very easily (requiring few bits to describe itself), which if you truly have strong Occam priors ought to be a requirement for any self-consistent basis for your theory. Of course, anyone can devise a programming language that autocompiles with a single bit instruction, which would make them simpler in one sense. However, any of these programming languages is going to be much more challenging to implement in any other language than binary lambda calculus. Binary lambda calculus is very simple in most pair-wise language comparisons. It's very difficult to come up with other programming languages that compile each other with as little difficulty as most programming languages can compile binary lambda calculus.

One of the arguments I would make against Occam's razor is that many more complicated programming languages than binary lambda calculus are practically guaranteed to outperform binary lambda calculus as the best programming language to use for measuring length. In particular, binary lambda calculus does not have a good enough system of pointers to make the explanation for "someone carried the box" simpler than "the box teleported itself." Teleportation is practically guaranteed to be the simplest explanation for motion in this sort of system by many, many orders of magnitude. Just update the position stored for the object is always way easier than specifying that somebody or something else moved it, and the degree to which this explanation is likely to be simpler overwhelms available evidence. It is likely to be simpler by thousands of bits setting the prior odds against any alternative at 2 ** 1000 to 1 even in optimistic projections.

In short, I think there are very strong information-theoretic counterarguments to the validity of interpretations of information-theoretic formulations of Occam's razor, and I think that the theory does deserve to be refuted on these grounds, but I also think that there is an even stronger logical refutation of Occam's razor that takes that deals with "True statements are true" sort of tautologies.

In particular:

The simplest explanation is always the simplest.
Equivalent theories are always equivalent.
The most persuasive explanation is always the most persuasive.
The most probable explanation (given available data) is always the most probable explanation (given available data).
Reflexively self-consistent explanations are always reflexively self-consistent.
The most accurate explanation is always the most accurate.
The epistemology that produces the most predictive priors is the epistemology that produces the most predictive priors.
Self-improving epistemologies are self-improving; whereas static epistemologies are static.
Etc.

We can model all of these things separately and see that all of these things contain some elements of consistency with Occam's razor and some elements of inconsistency with Occam's razor -- the most damning of which are that considerations about persuasiveness and reflexive self-consistency would lead us to realize that statements like Occam's razor would be likely to be believed even if they don't have any merit.

We can even create an epistemology that (assuming the basic accuracy of statistics) is guaranteed to be asymptotically equivalent to Occam's razor if and only if no better alternative exists.

This brings me to my final note which is to say that most epistemologies that include Occam's razor and another theory are themselves inconsistent with Occam's razor. It is possible to devise a system of priors from the assumption that probability theory is more-or-less correct. In which case you have a formal system that takes only probability theory as its axioms, and this system is itself information-theoretically simpler than a system which also has axioms for weighting simplicity.

Some of these considerations may be a little abstruse, but most of them should be relatively straightforward to anyone with the mathematical background required to understand the information-theoretic formulation of Occam's razor to begin with.

No comments:

Post a Comment