The TLS Handshake at a High Level
Cryptography — the art of coding messages in a way that they would be incomprehensible to anybody except the intended receiver — had been around for millennia, although most applications were military. Strong cryptography algorithms optimized for computer use, such as IBM's Data Encryption Standard (DES) were widespread by that time, but one critical element of secure cryptographic protocols is that the algorithm must remain secure even if all the details of its workings are known: the security must rely solely in the knowledge of a shared secret, usually referred to as the key (although in reality the "key" is actually not a key at all, but instead a very long secret number). By applying the algorithm with the key to the message to be sent, the encoded message would be produced, and by applying the algorithm with the key to the encoded message, the original message would be retrieved. This sort of cryptographic algorithm is called a symmetric one, referring to the symmetry between sender and receiver: they must both possess identical copies of the key. This sort of key-based cryptography has been intensively studied for centuries and encryption algorithms exist which, if applied correctly, can for all intents and purposes be considered uncrackable.
If nothing else, knowledge that cryptography is in place produces a sense of serenity in computer users, so it seemed clear that Netscape's secure socket layer should encrypt all communications. The first challenge in applying symmetric key algorithms to internet based interactions is that of key management. In the context of the sort of military applications that cryptography grew up around, key distribution, while difficult, was usually done face-to-face prior to the transmission of encoded communication over a distance. Unfortunately, this wasn't possible in the context of a commercial web browser: users couldn't establish a shared secret with, say, an online book store without first having a secure communications channel with the same bookstore. That is — they couldn't secure the communications channel without first securing the communications channel!
As it turns out, mathematicians Ralph Merkle, Whitfield Diffie and Martin Hellman had actually worked out a clever solution to that problem — that is, they figured out a way for two people communicating over an insecure/visible channel to agree on a number that an eavesdropper can't figure out. This is called a Diffie-Hellman exchange: by applying the Diffie Hellman protocol, the sender and receiver can agree on a shared number securely; by then using that shared number as the key in a classic cryptography protocol, they can bootstrap a secure channel over an insecure one.
Diffie-Hellman key exchange, while a stroke of brilliance, is actually pretty easy to understand. You can even imagine two people applying the protocol by shouting numbers at each other across a crowded room, each with a pencil and paper to compute interim results. If you and I found ourselves in that situation, at the risk of annoying the other people in the room, we could perform a secure key exchange in front of all of them. You would start by picking two random numbers p and g. You shout them across the room at me and we each write them down. You then pick a number a at random and write it down. You don't tell me that number, though — instead, you figure out the answer to ga%p (the remainder when ga is divided by p). You shout out this result, and I write it down. I now do the same thing with my own secret number b and shout out the result of gb%p, which you write down. Now, you work out the result of (gb%p)a%p and I work out the result of (ga%p)b%p. Because the modulus operator distributes over exponentiation, the result of both computations is guaranteed to be identical, but because an eavesdropper can't work backwards to figure out a or b from the interim results, the shared key remains secure: only you and I know the final result. This concept of distributed key agreement is known as public-key cryptography, mostly to distinguish it from the aforementioned symmetric cryptography.
There are some restrictions on the numbers used - p must be a prime number, for example, and g must be a primitive root to it. You don't need to keep this in mind or be intimately familiar with it to understand the protocol or see why it works, though.
With these two elemental building blocks, the basis of a secure web-based communication channel can be established: first, the browser and the server run through a Diffie-Hellman key exchange, then they agree on a secure cryptography algorithm to use that as a key for. At the time that SSL was invented, there were a few well-known contenders including DES, IDEA and RC4. The Netscape engineers responsible for defining SSL recognized that the history of cryptographic communications demonstrates that encryption algorithms only get weaker over time, and that these algorithms were almost sure to be replaced by stronger ones in the future (they were, in fact, completely correct in this regard). Therefore, they didn't specify that any one specific algorithm be used, but instead left the specification open to allow the selection to be negotiated dynamically based both on the client and server's preferences as well as their capability — in other words, if the client said that it wanted to use IDEA but the server didn't support IDEA, they could still fall back to, say, DES.
As it turns out, though, this isn't quite enough. There are two pieces still missing here, but neither is quite as obvious as the need for an encrypted channel. The first is verification. It isn't enough to just encrypt the message - imagine the encrypted message "We attack at dawn, unless the fire on the ridge is lit". An adversary could change the meaning of the message without actually decrypting it, by removing the "unless the fire on the ridge is lit" part; neither the sender nor the receiver would ever have any way of recognizing the truncated message. The attacker would have to know a thing or two about the format of the message to pull this off, but in the rigid world of computer communications, this sort of knowledge is the rule rather than the exception. To properly secure a communications channel, encryption isn't enough - message verification is required, too.
The symmetric-encryption analog of message verification is called a message authentication code (MAC). The idea here is to append the message with a digest of that same message, combined with another secret key. The digest is computed over the entire length of the message, so that removing or changing any part of the message changes the digest. A checksum is a simple example of a digest algorithm: you add up all of the characters in a message and combine the sum with a secret key. As it turns out, though, a checksum isn't a strong enough digest algorithm for cryptography: although an attacker wouldn't be able to change the MAC without knowledge of the shared key, he could figure out what the checksum was and work backwards to replace the original message with a different message that shared the checksum. This is called a collision in cryptography nomenclature. What is needed for proper message verification, then, is a secure hash algorithm: one that makes it impossible (or at least infinitely improbable) to find collisions given a message and a hash. Perhaps surprisingly, developing secure hash algorithms is harder than developing secure encryption algorithms. When SSL was first developed, two good secure hash algorithms were known: MD5 and SHA (MD stands for message digest and SHA stands for, even less creatively, secure hash algorithm). Just like cryptography algorithms, the SSL designers knew that hash algorithms would come and go and thus needed to include a way for the client and the server to negotiate not just an encryption algorithm but also a message authentication algorithm.
But even with all of this, it's still not quite enough to produce a secure channel. The second missing piece is protection against active attackers. Recall the Diffie-Hellman key exchange that you and I imaginarily performed at the hypothetical crowded cocktail party a few paragraphs ago. Now imagine that the room was so crowded that we couldn't see, or even hear each other, but somebody standing in the middle of the room could hear both of us. Every time one of us shouts out a number, he writes it down and then makes up his own, performing two simultaneous key exchanges with each of us. At the end, we each have a shared number with him, which we think we share with each other. If we use those shared secrets to encrypt our message to one another, he can easily intercept them, read them, and then re-encrypt them. The vulnerability here is what's known as the "man in the middle" attack for obvious reasons and is much easier to pull off in the context of computer to computer communications.
So, to properly establish a secure communications channel over an insecure medium, both sides must first agree on a cryptographic algorithm, then on a verification algorithm, then perform a secure key exchange, but before doing any of this, they must first authenticate each other! In contrast to the other problems I've outlined so far, there's no nice, neat, pat solution to this problem. To begin with, what does it even mean to authenticate another computer? In the context of the world-wide web and SSL, the server has a "name": www.amazon.com, or www.ebay.com, or www.commandlinefanatic.com. The client, on the other hand, doesn't necessarily have a name: it might be a registered user on the web site, or it may be an anonymous shopper. Fortunately, it suffices, to guard against a man-in-the-middle attack, for the browser to authenticate the server. This still leaves open the thornier question of how to securely do that.
Recalling, then, the discussion above about message verification, you might reasonably assume that the browser would authenticate the server by computing a digest of the server's domain name and verifying that... but remember that MACs require, again, a shared key — but you can't establish a shared key without first authenticating the server. The solution to this problem is what's called a public key infrastructure (PKI). The underpinning of a PKI, in turn, is the distributed-key equivalent of MACs: digital signatures. A digital signature, like a MAC, is an encrypted copy of a message digest, but unlike a MAC which requires the verification step to be performed with the same key that it was generated with, a digital signature splits the key into two parts: the private key which is applied to the digest to create the signature and the public key which is used to verify the signature. In this way, the server's host name (or whatever else you want to make verifiable) can be signed with the private key that corresponds to a public key which itself is already known to the browser. This does mean that the browser has to come pre-configured with a set of trusted public keys: these constitute the browsers trust store, and include a well-known set of commercial certificate authorities such as Verisign, Thawte, Comodo and GoDaddy. The certificate authority uses their (secret) private key to apply a signature to the server's host name, and the browser then uses the preconfigured public key to verify the signature.
Now, I glossed over digital signatures and public keys earlier. It's clear from the workings of Diffie-Hellman that you can't just use the same technique to implement digital signatures - for one thing, the public/private keypair must be persistent and reusable. Not long after Diffie and Hellman created their eponymous secure key exchange algorithm, another group of mathematicians named Ron Rivest, Adi Shamir and Leonard Adelman came up with a different algorithm, based on similar principals, which they named "RSA" after their initials. An RSA keypair is actually a trio of numbers e, d and n related such that mde%n=m. To sign a message m, convert it to a number (in this case, a secure hash h) and generate the signature s = hd%n. The receiver can verify the signature by computing the same secure hash and then computing se%n; by the same distributivity properties that allow the Diffie-Hellman key exchange to work, the original h will be the resulting output. A signature can't be forged without knowledge of d, so d is called the private key and the e,n pair is called the public key.
However, what the certificate authority signs isn't the host name of the server computer but instead a longer document called a certificate (hence the name "certificate authority"). The certificate includes, at a minimum, the server name, a validity period and another public key! Why would the server have its own public key? In fact, this is actually the part that guards against man-in-the middle attacks. Once the client verifies the signature of the certificate and ensures that the certificate belongs to the server whose host name it was trying to connect to, the server must then prove possession of the private key that corresponds to the public key in the certificate by signing a random message provided by the client. If it didn't perform this step, the proverbial man in the middle could just pass the signed host name messages through unmodified and start his attack once the client had accepted the authenticity of the server.
RSA isn't the only digital signature algorithm around, and wasn't in 1995 when Netscape was developing the SSL protocol: a similar, but somewhat more complicated algorithm named DSA (Digital Signature Algorithm) also existed, and was a U.S. government standard. DSA came from the ElGamal algorithm which was developed by Dr. Taher ElGamal — who happened to be the person that Netscape hired to develop the SSL algorithm in the first place. And, in fact, if you followed the description of the RSA algorithm closely, you might realize that you could easily turn it around and use the public key to generate an encrypted message that only the holder of the private key could read: hence Diffie-Hellman is also not the only secure key exchange method available either, once you have a way to verify ownership of public keys. The client can create a session key, encrypt it using the public key of an RSA algorithm and send it over a public channel — only the holder of the private key can read it.
So now, you can see the building blocks of the SSL handshake: first, the client establishes a public, non-secure connection with the server. Before it sends any potentially sensitive data (that is, any data at all), though, it performs an SSL handshake by first informing the server which public key exchange, encryption, verification and digital signature algorithms it understands. The server selects one of the ones that it understands and initiates a key exchange using the selected key exchange algorithm (in the first release, this was either Diffie-Hellman or RSA). The first step in doing so either way, though, is by returning a signed certificate which, at a minimum, includes the server's host name exactly as the browser requested it, a public key corresponding to either RSA or DSA (depending again on which digital signature algorithm was chosen), and a signature generated by a certificate authority which was already trusted by the browser. The key exchange continues with the server signing the key exchange, hence proving possession of the private key corresponding to the public key in the certificate. The client and the server then exchange two keys: one for the encryption and another for the verification. Once the keys have been exchanged, every subsequent packet is encrypted and authenticated using the two chosen algorithms.
The protocol that Netscape put together was very adaptive, but researchers found subtle flaws as time went on — the original SSLv2 was supplanted by SSL 3.0, which was then turned over to the IETF who renamed it TLS 1.0. TLS itself is currently working on its fourth revision, which looks like it will involve fairly significant changes to the handshake process, but the core principles of cipher negotiation and key exchange will always remain intact.