Information theory

From EvoWiki

(Redirected from Information)
Jump to: navigation, search

Contents

Introduction

Information is a tricky thing to define. Intuitively, it seems that the more information a message contains, the more you can learn from it, but how to measure this?

See also

Evolution of new information

Shannon's Information Theory

Claude Shannon, a mathematician at Bell Labs, lay a mathematical foundation for answering such questions by developing information theory in the 1940s.

In Shannon's information theory, we deal with messages. A message is a series of symbols. This may be a sequence of ASCII characters, a stream of bits, or some other sequence of symbols from a given alphabet.

A message can have several values. If you ask someone what sex they are, the answer can be either "Male" or "Female" (2 possible values). If you ask what time it is, the answer can be "12:00 a.m.", "12:01 a.m.", "12:02 a.m.", all the way to "11:59 p.m." (1440 values).

Each message value Xi occurs with a probability p(Xi). If you ask a random person their sex, the probability that the person will say "Female" is about 0.5, since roughly half of all people are female.

Entropy and information value

In Shannon information theory, the information content of a stream of bits is given by its entropy, defined as:

H(X) = - sum over i of p(Xi)*log2(p(Xi))

where X is the message, and Xi is the ith possible value of the message.

The function we are summing over,

-p(Xi)*log2(p(Xi))

is zero when p(Xi) = 0. It is zero when p(Xi) = 1, and rises to a maximum somewhere in between.

Intuitively, we see that if p(Xi) = 0, then this particular value will never be sent, so it does not contribute to the information content. If p(Xi) = 1, then the message will always have value Xi, so we already know what the message says without reading it, so the information content is zero. However, when all messages are equally probable, that's when the message contains the most information.

Information and DNA

So what does this mean for the creationist claim that mutations and natural selection cannot increase the amount of information in DNA?

Let's start with a population of 1000 individuals. At a particular location in their DNA, 500 of these individuals have the sequence "AAAA" and 500 have the sequence "CCCC". How much information does this sequence contain?

From the definition above, we see that p(AAAA) = 0.5 and p(CCCC) = 0.5. So H = -(0.5*log2(0.5) - 0.5*log2(0.5)) = 1.000.

Now let's suppose that a mutation has occurred, and we now have 500 individuals with "AAAA", 499 individuals with "CCCC", and one mutant with "CCCT". What is the information content now?

p(AAAA) * log2(p(AAAA)) = 0.50000
p(CCCC) * log2(p(CCCC)) = 0.50044
p(CCCT) * log2(p(CCCT)) = 0.00997

Thus H = -(0.50000 + 0.50044 + 0.00997) = 1.01041

So the mutation has added 0.01041 bits of information to the genetic pool.

Creationist claims


Genetics, complexity and randomness

It's a common creationist claim that mutations cannot increase genetic information. Mutations will introduce randomness and thereby decrease rather than increase the information. The complexity of genetic information cannot be increased by random mutations, therefore all genetic information now extant must always have been here. Something along these lines.

However, in information theory complexity and randomness are positively correlated. In Kolmogorov-Chaitin information theory a string is algorithmically complex or algorithmically random if it has a high entropy (relative to its length). The problem for the creationists is that they want "complex" to mean "non-random"; but information theory can't really help them here.

For a concrete example, read Nancy Pearcey's article DNA: The Message in the Molecule. Pearcey writes:

The information content of any structure is defined as the minimum number of instructions needed to specify it. For example, a random pattern of letters has a low information content because it requires very few instructions: 1) Select a letter of the English alphabet and write it down, and 2) Do it again. A highly ordered but repetitive pattern likewise has low information content. Wrapping paper with "Merry Christmas" printed all over in ornate gold letters is highly ordered, but it can be specified with very few instructions: 1) Write "M-e-r-r-y C-h-r-i-s-t-m-a-s," and 2) Do it again. By contrast, a structure with high information content requires a large number of instructions. If you want your computer to print out the poem "'Twas the Night Before Christmas," you must specify every letter, one by one. There are no shortcuts. This is the kind of order we find in DNA. It would be impossible to produce a simple set of instructions telling a chemist how to synthesize the DNA of even the simplest bacterium. You would have to specify every chemical "letter," one by one.

What Pearcey refers to in the first line is Kolmogorov-Chaitin entropy. However, "a random pattern of letters" will have a high, not a low, entropy. The entropy is measured by the instructions to produce this string and not some other string. Toss a coin 100 times and you will get some sequence of heads and tails. Toss a coin another 100 times, and you'll most likely get a very different sequence. In Pearcey's wording they would be the same sequence, because they can both be produced by tossing a coin 100 times; that is a simple repetition of a simple instruction. But how would Pearcey detect whether some sequence of heads and tails is "random" or designed? Pearcey's definition of randomness is statistical randomness, which isn't the same as algorithmic randomess - a string can be statistically random without being algorithmically random (the Champernowne sequence is an example).

If we have, as Pearcey writes, to specify every chemical 'letter', one by one, then we have something that is algorithmically random, something with high entropy, but that's the opposite way of detecting design as for instance William Dembski's Explanatory Filter works.


Another example is supplied by Timothy Wallace in his paper [http://www.trueorigin.org/isakrbtl.asp Five Major Evolutionist Misconceptions about Evolution], which is actually a response to a TalkOrigins article by Mark Isaak. Wallace quotes Isaac Asimov as follows

“Another way of stating the second law then is: ‘The universe is constantly getting more disorderly!’ Viewed that way, we can see the second law all about us. We have to work hard to straighten a room, but left to itself it becomes a mess again very quickly and very easily. Even if we never enter it, it becomes dusty and musty. How difficult to maintain houses, and machinery, and our bodies in perfect working order: how easy to let them deteriorate. In fact, all we have to do is nothing, and everything deteriorates, collapses, breaks down, wears out, all by itself—and that is what the second law is all about.”
[Isaac Asimov, Smithsonian Institute Journal, June 1970, p. 6]

and later writes

Now, the entire universe is generally considered by evolutionists to be a “closed” (isolated) system, so the 2nd law dictates that within the universe, entropy is increasing. In other words, things are tending to breaking down, becoming less organized, less complex, more random on a universal scale. This trend (as described by Asimov above) is a scientifically observed phenomenon—i.e., fact, not theory.

The problem here is that Asimov was in his well-known manner of popularizing scientific theories not necessarily giving the best possible explanation of the 2nd law. What Boltzmann, who was the first to equate entropy with disorder, meant was that increasing entropy in a system corresponds with increasing unpredictability of the internal state of that system. For instance, we can rely on gravity to make sure that dust will fall to the ground in a room - gravity induces a gradient, a low entropy state, and the dust falls down to equalize the gradient. If we do not remove the dust, we can rely on it remaining on the ground every time we check. Without gravity, dust particles could be anywhere - and that's disorder!

But which definition?

The argument above clearly depends on using Shannon's definitions of "information", "entropy", and so forth. If we used different definitions, the results might show a loss of "information".

Occasionally, a creationist will cite a theorem showing that information cannot increase. This, too, comes from Shannon. But consider his environment: he was working for Bell Labs, a research laboratory at AT&T. What they wanted to do was to take a message from a sender (say, a telegram), compress it as much as possible, in order to be able to send more telegrams per minute along a given wire, but then deliver the message to the recipient exactly as it was sent. By definition, AT&T could either retain the information in the message, or it could garble the message (lose information), but it could not add new information!

To understand what this means, consider that someone might want to send the message "2+2=5". The message is, as stated, false. Nonetheless, it is the message being sent. Suppose something occurs during transmission and one of the characters is changed. Any of the following changes would result in a true statement:

  • 2+2=4
  • 2+3=5
  • 3+2=5

There are, of course, a huge number of changes to make the statement just as wrong as the original. The important thing to realize is, while to our perspective, changing the message to one of those three above makes it better than the original (since it's now true), they are all decreases in information according to Shannon's model.

If the message "2+2=5" becomes "2+2=4", that is loss of information. If the message "2+2=4" becomes "2+2=5", this is also loss of information. This occurs because "100% information" is arbitrarily defined as "whatever occurs at the starting point." If I mutate the word "Fred" enough times, I can produce the entire text of Hamlet, but according to Shannon's model this is also loss of information!

Consider the following example. Fred wants to send a telegraph message from Atlanta to San Francisco, but there is no direct line, so he must send through Houston, where an operator will relay his message. Fred tells the telegraph operator in Atlanta to send "2+2=5" to Houston, with instructions to send it on to a party in San Francisco, but the Atlanta operator makes a mistake and renders it "2+2=4". This is loss of information. The Houston operator receives "2+2=4" and relays the message to San Francisco, but he also makes a mistake, turning "2+2=4" back into "2+2=5". Is this change loss or gain of information? Well, it depends on your perspective. From the point of view of the Houston operator, information has been lost because the error does not match the message he was (erroneously) told to send. From Fred's point of view, this second change has caused an information gain, restoring what was originally lost. Shannon model information is measured only with respect to an arbitrarily defined "correct" piece of data. The second change is both an increase and a decrease, depending on perspective. This shows two important flaws with creationist arguments using Shannon's model. First, it shows that information CAN be gained; there is nothing in Shannon's model that says it cannot. Second, it reinforces that the information content is measured against an arbitrarily chosen starting point. Yes, Shannon's model does provide a cap to information accumulation, but this is an illusion. If a flat worm evolves into a mammal, this is loss of information, but only because the flat worm was where we started. Ditto, if that mammal then evolves back into a flat worm, it would again be loss of information, but only because we're now looking at the perspective from the mammal.

Nothing in Shannon's model says macroevolution is impossible, it only arbitrarily calls whatever happens a decrease in information because it is change.

On the other hand, Shannon's concepts may work very well when considering DNA (and the RNA/protein matrix) in biotic systems. Mutations in a message may add information that is not desirable to the receiver; in fact, in biotic systems this is generally the case. But occasionally mutation information is actually a benefit to the receiver. For example, a mutation in a message originally offering $100 (for some work) to the receiver may actually be delivered as $101, a small but recognizable positive difference.

Chaitin-Kolmogorov Information

Also known as algorithmic information theory. Assume some processing machine with an input tape and an output tape. The input tape will contain a program string P, which may contain input data besides the code. After execution of P – assuming the execution to ever halt – the output tape will contain a string S.

The complexity C(P) of P is the length in symbols of P, and the entropy H(S) of S is the length of the shortest (= least complex) program that can generate S; in our case obviously H(S) ≤ C(P). Note that the entropy function depends on the instruction set of the processing machine and is in general not a computable function – meaning that there is no program E that given a string S as input is guaranteed to halt, if E tries to compute H(S).

A string S is algorithmically random, if H(S) ≥ |S| (= the length of S), that is, if S cannot be generated by a program that is shorter than S. While for any natural number N the existence of algorithmically random strings of length N can be proved, a consequence of the above is that it cannot in general be proved that some string S is algorithmically random.

More generally, let D be some dataset, then we might be interested in the least complex theory that can generate ("explain") D. Assume C to be some complexity measure over the space of theories; e.g. the number of words in the shortest formulation of a theory or the number of axioms in a theory. By analogy with the above, we can define the entropy H(D) of D as the complexity of the theory T which minimizes C and which can generate D. The length |D| of D can be defined as the cardinality of D, that is as the number of data-points in D. As above, it can be proved that for any natural number N there exists a dataset D such that H(D) ≥ |D|. Summa summarum: don't expect a simple Theory-of-Everything.

Mixed Meanings

One common technique in creationist circles is to mix the two meanings above. First, they define information in Chaitin-Kolmogorov terms, demonstrating that information increases are necessary for macroevolution. Then they invoke Shannon's model to declare that it has been shown that information cannot increase! But the two models are very different and cannot be used in conjunction in that way. This is equivalent to pointing out that there is a city called Florida in Puerto Rico, so the people of Florida (the state) must be Puerto Rican!

Conclusion

Creationists often claim that mutation and natural selection cannot create new information, but they rarely, if ever, define their terms, show the math, or show what "information" has to do with evolution.

For instance, if a species has a gene that says, in effect, "Don't make a wing," and a mutation removes the "Don't", then the mutants will have wings. Yet removing the "don't" would seem to be a loss of information.

As we've seen above, Shannon's information theory does not seem like an appropriate tool for discussing the characteristics of evolving (or non-evolving) populations. But since creationists are loath to show their math, it is difficult to evaluate their claims.


References:


External Links

Personal tools