Other

Video - How do we Measure Information?

May 31, 2013

How can we quantify/measure an information source? We introduce the ideas of Nyquist & Hartley using a simple game involving yes/no questions. It's important to realize all of this happened before Claude Shannon arrived on the scene. However, this measure applies only when communication involves random sequences.

Transcript

Consider the following: Alice and Bob have figured out how to transmit messages between their treehouses. At first, they used flames at night, and shutters during the day. Then they used a wire, which they plucked in different ways. Eventually, they electrified this wire to send electrical pulses, and were now at work on an experimental wireless method.

The problem is in order to pay for their equipment, they needed money. So they decided to offer their service -- for a fee to others. And on the first day, Alice had three new customers who wanted to transmit messages to their friends over at Bob's treehouse. The first customer wanted to send a list of 10 coin flips.

The second customer wanted to send a 6-letter word. And the third customer wanted to send a poker hand. The question now is, "How much should she charge?" Well, the price of a message should depend on how long it takes Alice to transmit it. But how could she measure the length of different types of messages using a common unit?

To find out, let's play a game. Imagine you are Bob now. And you know Alice wants to send you these messages. But all you can do is get the answer to yes-or-no questions you've arranged.

Alice will answer by sending a sequence of zeros or ones using some method of variation. Recall that all of their methods of transmission involve the exchange of differences. So, a 1 could be represented by an open flame, or an open shutter, or an electrical pulse. No matter how they are manifested, we can simply call them 'binary digits' -- because a binary digit can have only one of two values -- 0 or 1.

So, let's say 0 represents a 'no,' and 1 represents a 'yes.' Your challenge, now, is to always ask the minimum number of questions in order to determine the exact message. First, let's consider the coin flips. For each symbol, the sender, Alice, can be thought of as selecting one of two different symbols -- 'heads' or 'tails.' Now how many questions do you need to ask to determine which she selected? One questions -- such as, "Is it heads?" -- will suffice.

For 10 flips, what is the minimum number of questions? Well, 10 flips times one question per flip equals 10 questions -- or 10 binary digits to transmit this message. Next, let's consider the letters. 1 of 26 different symbols.

Let's start with the simplest message -- which is 1 letter. How many questions are needed? Is it A? Is it B?

Is it C? Is it D? And so on. But that is not the minimum number of questions.

The best you could do is ask questions which eliminate half of the possibilities. For example the middle of the alphabet is between M and N. So we could first ask, "Is it less than N?" If we receive a 1 -- yes -- we cut out half of the possibilities -- [so we have] 13 left. And since we can't split a letter in half, we divide the possible symbols into sets of 6 and 7, and ask is it less than G?

We receive a 1, which is yes. And now we are left with 6 possible letters. And we can split them in half, and ask, "Is it less than D?" We receive a 0, which is no -- leaving us with three possible letters. And now we can pick a side and ask, "Is it D?" We receive a 0, which is no.

And finally, we are left with two possibilities. We ask, "Is it E?" We receive a no. And after five questions, we have correctly identified the symbol: F. Realize that we will never need to ask more than five questions.

So the number of questions will be at least 4, and at most 5. And in general, 2 to the power of number of questions equals the number of possible messages -- which we previously defined as the 'message space.' So how can we calculate the exact average -- or expected -- number of questions, given a message space of 26? We ask the reverse question. 2 to the power of something equals 26.

And to answer these type of questions, we naturally use a logarithmic function, base 2 -- because log, base 2, of 26 is the exponent 2 needs to be raised to to give us 26. Which is approximately 4.7. So on average, approximately 4.7 questions will be needed per letter, at minimum. And since she wants to transmit a word with 6 letters, Bob can expect to ask, at minimum, 28.2 questions -- Which means Alice will need to send at most 29 binary digits.

Finally, let's apply this formula to a new message -- the poker hand. Well, for each symbol, the sender, Alice, 1 of 52 different symbols. And in this case, the number of questions is the same as the number of times we need to split the deck and ask Alice which pile it is in -- until we are left with one card -- which we will find is usually 6 splits -- or questions -- and sometimes 5. But we can save time and just use our equation.

Log, base 2, of 52 is approximately 5.7, since 2 to the power of 5.7 is approximately 52. So the minimum number of questions, on average, is 5.7 per card. A poker hand contains five cards. So to transmit a poker hand requires 28.5 questions, on average.

We are done. We now have our unit. It's based on the minimum number of questions to define the message -- or the 'height' of the decision tree. And since Alice transmits this information as binary digits, we can shorten this [term], and call our unit the 'bit' -- instead of 'binary digit.' So we have 10 coin flip require 10 bits.

The 6-letter word requires 28.2 bits. and the poker hand requires 28.5 bits. Alice, then, decides to charge one penny per bit, and begins collecting her fees. Now this idea emerged in the 1920s.

It was one of the more abstract problems that communication engineers were thinking about. Ralph Hartley was a prolific electronics researcher, who built on the ideas of Harry Nyquist -- both of whom worked at Bell Labs after World War I. And in 1928, Hartley published an important paper titled "The Transmission of Information." And in it, he defines the word 'information' using the symbol h -- as: h = n × logarithm of s, where h is our information, n is the number of symbols -- whether they're notes, letters, numbers, etc. -- and s is the number of different symbols available at each selection.

And this can also be written as h = logarithm of s^n. And Hartley writes, "What we have done, then, is to take, as our practical measure of information, the logarithm of the number of possible symbol sequences. So, information is the logarithm of the message space. However, realize that throughout this lesson, we have been assuming that the symbol selection is random -- a convenient simplification.

However, we know that in reality, most communication -- such as speech -- isn't always random. It's a subtle mix of predictability and surprise. We do not roll dice when we write letters. And it is this predictability which can result in significant savings in the length of transmission.

Because when we can predict things in advance, we shouldn't need to ask as many yes-or-no questions to define it. But how could we formally model this subtle difference? This question brings us to the key insight in our story. Can you think of what it might be?