Cryptographic ideas

Authenticity and privacy are important ideas in computer security; we'll discuss the way algorithms are used to provide these. (The algorithms themselves we will not be discussing, though you are welcome to read about them!)

Hashing

Suppose you have transferred a large data file; how do you check there were no errors? A first strategy is to check the size: if they have the files have different sizes, an error definitely occurred.

A more sophisticated strategy is to create a checksum: split the data into small pieces, interpreted as numbers, and add them all together. If the answers are the same, either the errors added to 0 (via overflow), or else the pieces are all the same.

Exercise: this doesn't mean the file was transferred correctly; why not?

The topic of error detection codes is rich, and won't be covered here (though it is very important for the theoretical study of language!).

A more sophisticated strategy is to use a hash function. A hash function is a way to associate a number (the "hash") to data, in a way that it is extremely unlikely that you will be able to find two blocks of data with the same hash (called a "collision"). The process of applying the hash function is called "hashing."

Some popular hash algorithms are:

Some hash algorithms are used for cryptography, and need to have certain mathematics justifying important properties (e.g., unlikelihood of finding collisions, lack of correlation between data and its hash, and others). For our purposes, we want our hash to not have collisions, but cryptographic strength is not important.

You can experiment with hashes on Linux or other UNIX-like systems. Typically hashes are displayed not as decimal numbers, but rather as hexadecimal, using digits 0-9a-f.

$ echo hi > testfile; echo hi2 > testfile2

$ md5sum testfile testfile2
764efa883dda1e11db47671c4a3bbd9e  testfile
4da1c6449a2a34a338e56d6718838016  testfile2

$ sha1sum testfile testfile2
55ca6286e3e4f4fba5d0448333fa99fc5a404a73  testfile
906faceaf874dd64e81de0048f36f4bab0f1f171  testfile2

$ sha512sum testfile testfile2 # sha2, hash size is 512bit
d78abb0542736865f94704521609c230dac03a2f369d043ac212d6933b91410e06399e37f9c5cc88436a31737330c1c8eccb2c2f9f374d62f716432a32d50fac
testfile
d1fde846964c462fe3686eb0c2bed0ab1c66feb77bd06a8f6aefdca90c04036b8c8bdd8ee65567ea2f4d30714f4343d0151a13b9937473d526f3cda5d5413023
testfile2

$ b2sum testfile testfile2 # blake2
7ea59e7a000ec003846b6607dfd5f9217b681dc1a81b0789b464c3995105d93083f7f0a86fca01a1bed27e9f9303ae58d01746e3b20443480bea56198e65bfc5
testfile
768d17d444de521eb82f9c45cf0d6384c87f18ea67e164c7a832e24ab47483be0c006fc8ab806694e670f4df306468c5f30f6a987f7f8ba02c011ba641db3a7b
testfile2

Notice that hashes are completely changed even though the file contents are nearly the same.

Exercise: do the same for files with "cat" and "cat2" instead of "hi" and "hi2"

Symmetric cryptography

You might know cryptography in the sense of symmetric ciphers: there is a secret SEC. To encrypt a message M, called the cleartext, you apply a mathematical operation enc(SEC, M). The result is the ciphertext X.

Anyone who knows the same secret SEC can apply a mathematical operation dec(SEC, X) to recreate the message, that is,

dec(SEC, X) = dec(SEC, enc(SEC, M)) = M

Exercise: implement the Caesar cipher in python, which advances each letter of 'M' by 'SEC = n': enc(1, "a") = "b", etc.

Asymmetric cryptography

Asymmetric cryptography works differently: there is a pair of related keys (numbers), for now call them A and B. Just like with symmetric cryptography, there is a mathematical operation RSA(key, x). The keys A and B have a special relationship: if a message is encrypted with A, it is decrypted by B, and vice-versa:

RSA(A, RSA(B, M)) = M
RSA(B, RSA(A, M)) = M

Without B, RSA(A, x) cannot be decrypted, even with A.

The first system for constructing such A, B, and RSA is called the [Rivest-Shamir-Adelman algorithm], after the three authors. Now there are many such systems, not all of which can properly be called RSA.

Read and attempt to follow along with the PGP tutorial.