The Nature of Information III
George Hrabovsky
MAST
Introduction
In this article we examine some odds and ends about the nature of information.
Some Rules About Entropy
We will examine some more of the mathematics of entropy. We still do not have a good grasp of what entropy is other than some function of the number of pieces of information. The more ways that information can be combined, the higher the entropy. A good example of this idea is to think of the letter tiles in a Scrabble game. If you take the titles and throw them into a bag what is the chance that you will not get a meaningful word by drawing them at random? For a small number of tiles the chance of disorder is lower than with a larger number of tiles. In other words it becomes harder to make sense out of a large amount of random information, than a small amount. Another way of saying it is with a large quantity of information there is more a chance of getting gibberish.
Having said that, here are some principles about the entropy of information:
H(p) is continuous in p. This means that small changes in a probability distribution will produce small changes in the entropy.
H(p) >= 0. This means that entropy never gets smaller. You always get more information than you started unless you already know everything, when H(p) = 0.
H(p)≤ c. This means that the entropy is equal to or less than some parameter which is itself based on specific values that can be obtained by some distribution. The entropy is equal to the parameter when all values are equally likely. Another way of interpreting this is that when there are a lot of options you are less likely to be able to predict what will happen next.
Bits and Nats
I previously stated that the bit is the fundamental unit of information. The bit is actually the unit of the entropy of information, and this is only true if the information is binary, that is base-2. This is usually interpreted as on or off, thus there are two possible states for any piece of information to be in; true or false, yes or no, etc. There is another base we can use, called the natural base, or base e. In this base the unit of entropy is the nat.
Strings of Samples
Let’s say that you have a string of Nsamples of some variable, say x, each drawn from some probability distribution . This string is denoted,
In that case you will discover that there is a certain number of instances where the value appears. The probability of this instance is , and these probabilities are independent of each other. As such the probability to see the string is given as the product,
this can be rewritten, assuming that there are M different values,
Asymptotic Equipartition Property
One useful analysis technique is to look at the behavior of a function for extremely large values. Since we have a power, this will become very large indeed! Let’s reduce this, without changing the behavior, by taking its logarithm,
We can then divide both sides by ,
Previously we defined the entropy as,
This is a result of the law of large numbers from probability theory.
Using base 2 and inverting the previous result we get,
How do we interpet this? The chance of finding a long string is independent of the elements of the string! This is the Asymptotic Equipartion Property, also called AEP. The inverse of this,
is the effective number of strings of the given length. The actual number M, is
The difference between M and is the reason behind data compression schemes.