Video - Bitcoin - Cryptographic Hash Functions

Saying that a cryptographic hash function need be Computationally Efficient seems confusing to me. In fact, the more efficient it is, it opens the door to brute force reversal of the function. Wouldn't it be more precise to say that the ratio of the difficulty of computing the reverse function, is as great as possible?

For example, BCrypt and SCrypt are specifically designed to be inefficient to compute in some way to thwart brute force attacks (as is the PBKDF2 protocol for wrapping an efficient hash function in a way to make it less efficient to compute).

Transcript

Cryptographic hash functions are basically fundamental building blocks that are used within many cryptographic algorithms and protocols. They have a number of very important applications in the context of information security as a whole. Some of them were common algorithms in this category that are known as cryptographic hash functions include things like MD5 and also it has some predecessors like MD4 and others. As well as algorithms like SHA-256 and actually SHA-256 is preceded by other algorithms like SHA-1 and so on. And also, there are some algorithms that maybe you've heard of or maybe that are a little bit lesser known but things like RIPEMD and BLAKE and Skein and others.

Now, cryptographic hash functions are basically used as critical building blocks in many applications. And really the first motivating application, the first historical application of these types of hash functions was in the context of what's known as a digital signature. Digital signature are used in so many different cryptographic applications today. They are the cornerstone of many e-commerce protocols, they're used in doing things like Bitcoin generation and so on. Cryptographic hash functions are also used in things like messages indication protocols, in pseudo and a number generation in password security, even inscription to some degree. And in fact, aside from that are used in digital signatures, these hash functions are also used in other places in the Bitcoin protocol as well.

So, first of all, let me talk about what a cryptographic hash function actually is. And, of course, as the name implies the first thing is, it's a hash function. And by hash function I basically mean that it will take input, it's a mathematical function or transformation, if you will. That takes a particular input and we call those input a message. Okay, and a message can be of arbitrary length. And the hash function basically applies a mathematical transformation, maybe a set of mathematical transformations to this input to produce a single output and we typically call this output a digest. Although sometimes you will see the output referred to as a tag or as a hash or as a fingerprint, but digest is sort of the most common nomenclature. In fact, MD5 which was one of the earlier hash function stands for Message Digest 5, okay, and MD4 Massage Digest 4 so on and so forth. Okay.

Now, the message as I mentioned briefly can be of arbitrary size. They can be as long as you want or as short as you want. But the output, the size of the digest or the size of the tag is going to be fixed in length. And, for example, in the contact of a hash function like let's say SHA-256 the digest will actually be exactly 256 bits in length. Okay. So, it's kind of a fixed length output but an arbitrary length input. Okay. And the other thing I want to point out about these cryptographic hash functions is that the function here is a deterministic function. And by that, I mean that the output will always be the same for a given input. So, if you have a given input you're going to see the exact same output. You won't have a situation in which may be a given input will have two different possible output, it's going to be consistent. Okay.

Now traditional hash functions are -- had been used in computer science for quite some time and they are used in many computing applications. So, for example, you may have heard of something like a hash function used and they called the hash table. But the kinds of hash functions that are used in hash tables, and I want to stress this, they aren't necessarily the same as cryptographic hash functions. Okay. The qualifier cryptographic here is very very important and it usually means or implies that hash function has to have a certain set of critical design goals and maybe some particular properties in mind. That make it suitable for use in applications that use cryptography or in cryptographic applications areas where perhaps security or privacy or confidentiality authentication is a serious concern. Okay.

So first and foremost, and maybe in describing somebody's properties is that and maybe this almost goes without saying one of the first properties you want at a cryptographic hash function is that it should be computationally efficient. And by that I mean that it shouldn't take a lot of time to compute the output from a given input if you're given a message and you want to apply the set of transformations to that message to get a digest that set of transformation should not take a long time to implement on a computer. It should be very fast, relatively fast. And it almost goes without saying but I think it is important to stress and point it out because I've seen people come up with grossly inefficient hash functions sometimes and those would not be considered suitable in the context of when typical cryptographic hash functions are used for cryptographic applications. Okay.

And the second property you typically need in the context, and this is especially in the context of digital signing, is that you wanted to be the case that it's hard to find two inputs that actually map to the same output. I mean, two distinct inputs whose corresponding digest is identical. And this property is typically referred to as collision resistance. Okay, and that means it's hard to find a colliding pair of input. So in other words, if you have two inputs and let's say you have some message M1 and you have a massage M2 okay. There are output under the application of the hash function should not be the same. You won't never have to be the same that the -- you wouldn't ever have be the case rather that the output of M1 and M2 under an application of the hash function is the same. They should be never be the same, should always be different. Okay.

Now, I should take a step back here and point out that of course in practice given that massages can be of arbitrary size and given at the input or the output rather is a fixed size, it's not mathematically possible to guarantee that the output will always be different for two distinct messages. But what you typically want is not that the outputs are necessarily different but that it's hard to find two distinct messages that produced the same output. We know that they exist by virtue of the fact that there are many messages that can be hashed and only a finite and small number, or relatively small number compared with the number of messages. A smaller number of possible digest values, but aside from that consideration it should be hard, it should take a long time. And by long, I mean an astronomically long time to find two distinct messages that would result in the same output under the application of the hash function. Okay.

Now, the third thing I want to point out is that in many cases you might want also in the context of a hash function. For the hash function, you are able to hide information about the input. In other words, given the output it should be hard to glean anything useful or interesting about the input. Okay. Anything, any relevant detail and I don't just mean the whole input but maybe even something as simple as you know what's the input and even number or an odd number. I mean, that should be the kind of thing that should be hard to infer when you see the output. Even something as simple as the even is the oddness of the input. Okay.

Now, a fourth property I want to point out is that you typically want the output to be well-distributed. In other words, the output should look random. In other words, it should look like a set of coin flips took place not that there was a predictable way in which the output was created. Okay. And really what that means is that and you can think of it maybe more conceptually as it's almost as if you flipped a bunch of coin to get to the upward. It should look just that random. Okay. And so, what you can really think of cryptographic hash functions as is perhaps maybe the mathematical equivalent or mathematical analog of a meet finder of sort, they can take inputs and apply these mathematical transformations to them such that the output looks for all intense and purposes completely random and unrelated to the original input. Okay.

I do want to make a few quick remarks about these particular properties. And first of all, they are interrelated. For example, if you have a situation where the outputs, let's say really appear to bear no relationship to the input and the outputs also look random then that will in fact give you to a large that we allotted the collision resistance properties. Because just the fact that you can't predict the output and the fact that it hides all this information implies that it's going to be hard to find two inputs that are distinct that match with the same output. And so sometimes you get one property in exchange for the others.

And the second thing I want to point out is that typically these properties in practice or maybe even in the underlying mathematics are things that you hope for but you can't always guarantee that they'll always hold it and it may be entirely possible that you could design a hash function that you think is completely collision resistant, but someone might come along a year from now and come up with the more clever way for finding a collision. Maybe, they will find a clever shortcut that does not involve doing a brute force search of any sort. Okay, and it turns out that cryptographers, for better or worse, currently do not have any mathematical technique. They haven't developed techniques for being able to work around some of these limitations. And so we often do take it on face value that these schemes are secured based on how long they have been around. Okay.

Now, I also want to point out and the last thing I want to point out is that this treatment that I've given is not meant to be mathematically rigorous by any stretch of the imagination. There are far more precise ways to describe these underlying properties. But my hope instead is that this video gives you may be a bit of a flavor for what is required of a cryptographic hash function without maybe getting bogged down in some of the mathematical minutiae and formalism.

(END OF AUDIO)