RSA Coding - by Dr. Sarah - Adapted from Bill Bauldry's Cryptography in the Classroom.

Sophie Germain primes are still of interest today. People are still trying to find larger and larger Sophie Germain primes. These primes are of interest in coding systems.

In 1977, Ronald Rivest, Adi Shamir, and Len Adelman invented a coding system called RSA that is the most successful current commercial cryptosystem. A public key for encoding is given to anyone, while a private key for decoding is kept secret. The security of the system is based on the idea that factoring large numbers is extraordinarily difficult to do in practice.

The coding system is based on modular arithmetic.
a mod b is the whole number remainder of a/b. So, 9 mod 6 = 3, because 3 is the whole number remainder when 9 is divided by 6. You can also do this on your calculator. 9/6=1.5, so we take the decimal part (.5) and multiply by 6 to get back 3.

To Code and Decode

  • We start with 2 primes p and q that have many digits (at least 200 digits each). We could use a Sophie Germain prime to obtain our two prime numbers, because when you have a Sophie Germain prime, you automatically get another prime number, by doubling it and adding one.

  • We find numbers e (encoding number) and d (decoding number) so that e d mod (p-1)(q-1) = 1. There are mathematical methods of finding e and d, but we won't go over them. (Extra credit if you read the relevant section in the Heart of Math book and report back on it).

  • Let n = p q. We publish e and n. To encode a message, words are first converted to a long string of numbers. For example, my initials, SG, might be converted to 1907 via the conversion a-->01, b-->02, ..., z-->26. Then, we take the string of numbers, raise it to the eth power, and reduce mod n. This gives us the coded message. For example,
    1907e mod n
    is the coded message. Notice that anyone can encode since they know e and n, which have been made public.

  • RSA coding is based on the fact that d is kept secret. We decode via taking the coded message, raising it to the dth power, and reducing mod n. For example, if the coded message was 232, the decoded message would be
    232d mod n.
    Notice that you can't decode unless you know d, which is kept secret. In addition, we can't just solve for d, since we would need p and q to do so in order to solve the equation ed mod (p-1)(q-1)=1 for d. But, n has so many digits that it is very hard for a computer to factor n into the product pq at present computer speeds, so people cannot solve for d. Hence, they cannot decode messages.

  • To Review:
    Start with two large primes (could use a Sophie Germain prime p, and 2p+1)
    Now find e and d, the encoding and decoding numbers, by an equation that involves modular arithmetic.
    To encode we turn our message into numbers and then calculate (uncoded message) e mod n.
    To decode we calculate (coded message) d mod n and then turn our numerical message into letters.
    Since d is kept secret and is very hard to solve for, this is a good coding system!

    An Example:

    We'll choose p=163 and q=229 as examples to work with in order to give you a better sense of some of the issues involved. Note that these aren't 200 digits each, as they are only 3 digits long, but that would take way too much space and time to work with here.

    e and d can be found via a mathematical process (that we won't go into) to be e=1151 and d=5327. Let's check that e and d satisfy e d mod (p-1)(q-1) = 1. In other words, we want to check that 1151*5327 mod (163-1)(229-1) = 1. This is equivalent to checking that 1151*5327 mod (162)(228) = 1, which is the same as 6131377 mod 36936. So, we want the remainder when 6131377 is divided by 36936. So, I would take 6131377 and divide by 36936. We get 166.00002707... Since we want the remainder, we subtract 166, and then multiply by 36936. We get an answer of 1 (recall that the calculator has rounded, so that changes the answer slightly). Hence 1151*5327 mod (163-1)(229-1) = 1, and so we see that e and d were valid numbers to use for encoding and decoding.

    In order to encode SG, which was converted to 1907, we look at 1907^e mod n = 1907^1151 mod 163*229= 1907^1151 mod 37327. Try doing this on your calculator. You will get an error. This is because there are too many digits for your calculator. And remember, p and q are supposed to be hundreds of digits long. We took them as only 3 digits long just for convenience.

    1907^1151 =
    Look how many digits 1907^1151 is - no wonder you couldn't do it on your calculator - 3776 digits!

    Now let's reduce mod 37327 -- 1907^1151 mod 37327 = 12525
    So, 12525 is the coded message standing for SG. No one can decode this message unless they have d.
    To decode, I do (coded message)^d mod n = 12525^5327 mod 37327 and get back 1907, which I then convert back to SG.

    In practice, as mentioned before, we start with prime numbers p and q that are at least 200 digits long. The block size (the length of the string of letters we convert to numbers) is the largest grouping that stays less than n=pq. So, if I have a page of text, I divide it up into blocks that are as large as possible (less than n) and then convert these blocks to numbers. Then, I take those numbers, raise them to the eth power, and reduce mod n. For the numbers used in real life, it would be impossible (with present computer speeds and memory) to directly raise them to the eth power. Instead, the "square and multiply" function for calculating powers with large numbers in modular arithmetic is a clever way to easily calculate modular powers. The basic idea is to square the number, then reduce it mod n (this gives the remainder, which is must smaller than the square), and then multiply again and reduce mod n, and keep repeating this process until we have found the number to the eth power mod n. By reducing mod n every time we square or multiply, the numbers are kept small. You would be surprised to find out how quickly your calculator can "square and multiply" to find that 1907^1151 mod 37327 = 12525, even though it cannot do the 1907^1151 part of the calculation directly.