This chapter introduces why random shifts result in perfect secrecy. We explore hardware random number generators vs. pseudorandom number generators which expand a short random seed into a long sequences of "random looking" numbers. If used in cryptography, can offer "practical security" which is based on the secrecy of the random seed only. Alice and Bob can safely assume that the "enemy knows the system" (one-time pad + pseudorandom generator), and focus on assuring their seed is shared in secret & generated randomly.
Transcript
Transcript generated by YouTube auto-captions. May contain errors.
consider this game Eve instructs Bob to go into a room the room is empty except for some locks an empty box and a single deck of cards Eve tells Bob to select a card from the deck and then hide the card as best he can rules are simple Bob cannot leave the room with anything and he can put at most one card in the box he wins the game if Eve is unable to determine his card what is his best strategy Bob selects his card six of diamonds and throws it in the box at first he considers the different types of locks maybe he should lock the card in the box with the key Eve can pick locks so he considers the combination lock it seems like the best choice [Music] but suddenly he realizes a problem the remaining cards leak information about his choice since it's now missing from the deck the locks are a decoy he shouldn't separate the card from the deck he returns his card to the deck and shuffles the deck to randomize it shuffling is the best lock because it leaves no information about his choice it's now equally likely to be any card in the deck he can now leave the cards openly in confidence he wins the game because the best Eve can do is simply guess he has left no information about his choice every card is equally likely to be the card he picked most importantly even if we give Eve unlimited computational help she can't do any better when this is the case we can say we have achieved perfect secrecy on September 1st 1945 29-year-old Claude Shannon published this idea in a classified paper he thinks about it as follows imagine Alice writes a message to Bob 20 letters long this is equivalent to picking one specific page from the message space the message space can be thought of as a complete collection of all possible 20l messages anything you can think of is a page in the stack next Alice applies the shared key which is a list of 20 randomly generated shifts between 1 and 26 the key space is the complete collection of all possible outcomes generating the key is equivalent to selecting a page from this stack at random when she applies the shifts to encrypt the message she ends up with a cipher text the cipher teex space represents all possible results of an encryption when she applies the key it maps to a unique page in this stack notice the size of the message space is equal to the size of the key space and Cipher text space so so if Eve has access to a page of Cipher text the only thing she knows is that every message is equally likely no amount of computational power will help improve a blind guess this defines perfect secrecy the problem with the onetime pad is that we have to share long random keys in advance to solve this problem we need to relax our definition of secrecy by developing a definition of pseudo randomness [Applause] when we observe the physical world we find random processes everywhere we can generate truly random numbers by measuring random fluctuations known as noise when we measure this noise known as sampling we can obtain numbers for example if we measure the electric current of TV static over time we will generate a truly random sequence we can visualize this random sequence by drawing a path that changes Direction according to each number known as a random walk notice a lack of pattern at all scales at each point in the sequence the next move is always unpredictable random processes are said to be non-deterministic since they are impossible to determine in advance machines on the other hand are deterministic their operation is predictable and repeatable in 1946 John Von Newman was involved in running computations for the military more specifically the design of the hydrogen bomb using a computer called the aniac he pled to repeatedly calculate approximations of the process involved in nuclear fion this required quick access to randomly generated numbers that could be repeated if needed however the ANC had very limited internal memory storing long random sequences was not possible so Newman developed an algorithm to mechanically simulate the scrambling aspect of Randomness first select a truly random number called the seed this number can come from the measurement of noise or the current time in milliseconds this seed is provided as input to a simple calculation multiply the seed by itself then output the middle of this result then use the output as the next seed and repeat the process as many times as needed this is known as the middle squares method and is the first of a long line of pseudo random number generators they all follow the same principle except a short random seed and expand it into a long sequence the randomness of the sequence is dependent on the randomness of the initial seed only same seed same sequence what are differences between a randomly generated versus pseudo randomly generated sequence if we represent each sequence as a random walk they seem similar until we speed things up the pseudo random sequence must eventually repeat this occurs when the algorithm reaches the seed it previously used and the cycle repeats the length before pseudo random sequence repeats is called the period the period is strictly limited by the length of the seed for example if we use a two-digit seed then an algorithm can produce at most 100 digit before reusing a seed and repeating the cycle a three-digit seed can expand past a th000 digits before repeating a four digigit can expand past 10,000 digits if we use a seed large enough the sequence can expand into trillions of digits before repeating making it harder to distinguish between the two the key difference is important when you generate a number pseudo randomly there are many sequences which cannot occur if Alice generates a truly random sequence of 20 shifts it's equivalent to a uniform selection from the stack of all possible sequences of shifts this stack contains 26 to the power of 20 Pages which is astronomical in size if we stood at the bottom and shine a light upwards a person at the top would not see the light for around 200 million years compare this to Alice generating a 20-digit pseudo random sequence using a 4digit random seed this is equivalent to a uniform selection from 10,000 possible initial seeds meaning she can generate one of the 10,000 possible sequences a tiny fraction of all possible sequences when we move from random to pseudo random shifts we shrink the key space into to a much smaller seed space for a pseudo random sequence to be indistinguishable from a randomly generated sequence it must be impractical for a computer to try all seeds and look for a match this leads to an important distinction in computer science between what is possible versus what is possible in a reasonable amount of time we use the same logic when we lock up our bike we know anyone can simply try all possible combin ations until they find a match but it would take them days to do so so for 8 hours we lock up our bike we assume it's practically safe with pseudo random generators the security increases as the length of the seed increases if the most powerful computer would take hundreds of years to run through all seeds then we safely can assume it's practically secure instead of perfectly secure as computers get faster the seed size must increase accordingly pseudo Randomness frees Alice and Bob from sharing all of their random shifts in advance instead they share a short random seed and expand it into the same random looking sequence when needed but what happens if they can never meet to share the seed