COT 5937 Homework #2 Solutions Summer 2001 1a) A frequency analysis reveals the following: ciphertext freq --------- ---- C 37 G 24 S 20 K 28 Y 15 I 15 We can be fairly safe to guess that C -> e. Also, we are given that F -> w. Searching for TRI grams we find an interesting 5-gram, FZCCN that shows up 3 times. With a little bit of luck, we can guess Z -> h and N -> l, forming the word "wheel" for that 5-gram. We also find that occuring directly after that 5-gram in two places is the 6-gram DGYYSF, which we know ends in w. G is quite frequent, and plugging in a t for it yields too many consonants in a row in a couple places. So, we guess that G is a vowel. The most likely vowel is a. Plugging in G -> a looks reasonable. From this, it is unlikely that Y is a vowel, and fairly likely that S will be a vowel. Trying S -> o and plugging in the common consonants for Y show that the most likely answer is Y -> r, which would indicate that D -> b to form barrow, or the 11-gram wheelbarrow in two places of the text. From here on out, you separate out what you can and look for common words to appear. doing this yields U -> t (for the), I -> d (for wheeled), and several others. Here is a list of the key: A->v C->e D->b E->i F->w G->a H->f I->d J->c K->s L->y M->m N->l O->n P->u Q->j S->o U->t W->g X->p Y->r Z->h and the message: i may not be able to grow flowers but my garden produces just as many dead leaves old over shoes pieces of rope and bushels of dead grass as anybodys and today i bought a wheelbarrow to help in clearing it up i have always loved and respected the wheelbarrow it is the one wheeled vehicle of which i am perfect master 1b) Using the Kasiski test, we find the trigrams "HJV" and "KFT" appearing more than twice. In particular, "HJV" starts at location numbers 108, 126, 264, 318, and 330. "KFT" begins at location numbers 80, 98, and 254. The greatest common divisor of both of these sets of values is 6. This gives us a strong indication that m=6. Calculating the corresponding indexes of coincidence, we find values ranging from .042 to .083, with four of the six values above .063. This corroborates the evidence that m=6. Now, calculate the mutual index of coincidence between the first set of letters and the rest, for each possible shift. Doing so yields the following original keyword: k1, k1+15, k1+22, k1+13, k1+17, and k1+12. Try all 26 possibilities for k1, the first character of the keyword, and you find k1 = 2, which corresponds to the keyword "CRYPTO" which turns out to be the correct keyword. Applying this to decrypt the message, we get: I LEARNED HOW TO CALCULATE THE AMOUNT OF PAPER NEEDED FOR A ROOM WHEN I WAS AT SCHOOL YOU MULTIPLY THE SQUARE FOOTAGE OF THE WALLS BY THE CUBIC CONTENTS OF THE FLOOR AND CEILING COMBINED AND DOUBLE IT YOU THEN ALLOW HALF THE TOTAL FOR OPENINGS SUCH AS WINDOWS AND DOORS THEN YOU ALLOW THE OTHER HALF FOR MATCHING THE PATTERN THEN YOU DOUBLE THE WHOLE THING AGAIN TO GIVE A MARGIN OF ERROR AND THEN YOU ORDER THE PAPER 1c) Doing a frequency count, we find that the two most common characters are 'C' and 'B'. Since there are quite a few more 'C's than 'B's, we will assume that the decryption of 'C' is 'e'. We can be fairly certain that 'B' decrypts to one of the following letters: 't','a','o','i','n', 's', 'h', or 'r'. Now, let's set up our affine cipher equations (where integers a and b are the encrypting keys and x is the value of the letter that is the decryption of 'B'): 2 = 4a + b --> b = 2 - 4a --> 2 - 4a = 1 - xa 1 = xa + b --> b = 1 - xa (x-4)a = -1 Since gcd(-1,26) = 1, we must also have gcd(a,26) = 1 and gcd(x-4,26) = 1. Now, let's look at our possible values of x: x = 0, 7, 8, 13, 14, 17, 18, 19 But, we know that all even values of x are impossible. Also, 17 can not work because 17-4 = 13, and that is not relatively prime to 26. That leaves us to try the values 7, 13, and 19, corresponding to h, n and t. When our second equation is 1 = 7a + b, we end up solving a=17, b=12 This turns out not to work. When our second equation is 1 = 13a + b, we end up solving a=23, b=4. This also turns out not to work. When our second equation is 1 = 19a + b, we end up solving a=19, b=4. This DOES work!!! It gives us the Canadian national anthem in French, as follows: O CANADA TERRE DE NOS AIEUX, TON FRONT EST CEINT DEFLEURONS GLORIEUX! CAR TON BRAS SAIT PORTER LEPEE, IL SAITPORTER LA CROIX! TON HISTOIRE EST UNE EPOPEE DES PLUS BRILLANTS EXPLOITS. ET TA VALEUR DE FOI TREMPEE, PROTEGERA NOS FOYERS ET NOS DROITS. 1d) Looking at the large number of repeated letters in the ciphertext(EE, SS, KK, MM, GG) it is clear that a monoalphabetic cipher(substitution) was not being used. The only non-monoalphabetic ciphers introduced in this chapter were Vigienere, Hill, permutation, LFSR Stream Cipher, and Autokey. First we will assume that the technique used was exactly one of these and not some hybrid. We were told that Hill is very difficult to break with a ciphertext only attack. So, we cross that off our list. A frequency analysis indicates that a permutation cipher is impossible. (Notice that a permutation cipher should leave the frequencies unchanged. But, the frequencies of letters in this ciphertext do not come close to resembling the list given to us in page 26 of the text.) It is unclear exactly how an LFSR stream cipher would be implemented on an alphabet of size 26. So, we can try Autokey next. There are only 26 possible keys. Trying all of these yields no solution. Finally, we are left with Vigienere before we consider more difficult options. An index of coincidence calculation indicates that the keyword length is 6. A calculation of mutual index of coincidence reveals that the keyword is of the form k1, k1+14, k1+11, k1+21, k1+24, k1+5. Cycling through the 26 possibilities, we find the keyword that works is "THEORY". Here is the resulting plaintext: I GREW UP AMONG SLOW TALKERS. MEN IN PARTICULAR WHO DROPPED WORDS A FEW AT A TIME LIKE BEANS IN A HILL. AND WHEN I GOT TO MINNEAPOLIS WHERE PEOPLE TOOK A LAKE WOBEGON COMMA TO MEAN THE END OF A STORY I COULDNT SPEAK A WHOLE SENTENCE IN COMPANY AND WAS CONSIDERED NOT TOO BRIGHT. SO I ENROLLED IN A SPEECH COURSE TAUGHT BY ORVILLES AND THE FOUNDER OF REFLEXIVE RELAXOLOGY, A SELF HYPNOTIC TECHNIQUE THAT ENABLED A PERSON TO SPEAK UP TO THREE HUNDRED WORDS PER MINUTE. 4) With the given information, we can only verify if m=1,2 or 3 is correct. There is not enough information to determine a specific key of a larger size. Try m=1: We have that b -> R so we must have 1(a) = 17 a = 17 is the solution. Now, we know that r -> U. Try 17(17) = 289 mod 26 = 3 which is NOT 20. Thus, we can eliminate m=1 as a possibility. Try m=2: Using brea -> RUPO, we have the following matrix equation: (1 17) X (a b) = (17 20) (4 0 ) (c d) (15 14) You'll notice that the matrix on the left is NOT invertible. Thus, we know that m can NOT equal 2. Try m=3: Using breathtak -> RUPOTENTO, we have the following matrix equation: (1 17 4) (a b c) (17 20 15) (0 19 7) X (d e f) = (14 19 4) (19 0 10) (g h i) (13 19 14) The inverse of the matrix on the left is (10 2 5) (7 2 1) (7 17 1) Thus, we have: (a b c) (10 2 5) (17 20 15) (d e f) = (7 2 1) X (14 19 4) (g h i) (7 17 1) (13 19 14) So, our encrypting matrix is (3 21 20) (4 15 23) (6 14 5) Now, let's check if that works to encode the next three characters. We should have ing -> SUP. Let's encode ing with our matrix: (8 13 6) (3 21 20) (8 5 21) X (4 15 23) = (6 14 5) This gives us the encryption "IFV", not "SUP". Thus, m is not 3. From this we can conclude that with the given information no value of m, nor any specific key can be determined. (Though we can say that m > 3, and for any value of m > 3, we'd need more plain and ciphertext.) 5) Let B be a 3x3 matrix with each row being the column vector b1,b2,b3 given in the problem. Let Y1 be a 3x3 matrix with each row equal to three letters of cipher text and X1 be a 3x3 matrix with each row equal to the corresponding plaintext of the contents of Y1. Let Y2 and X2 be set up similarly. Let the matrix M be the key in the Hill cipher. Now we can form the following two equations: Y1 = (X1)M + B Y2 = (X2)M + B Y2 - Y1 = (X2 - X1)M M = (X2 - X1)^(-1)(Y2 - Y1). Now, we have enough given information (18 characters of plaintext with matching ciphertext to substitute for X1, X2, Y1, and Y2 in the equation above. ( 3 18 17) (23 11 9) (20 19 18) Y1 = (12 18 8) Y2 = ( 1 25 20) Y2 - Y1 = (15 7 12) (14 15 11) (11 11 12) (23 22 1) ( 0 3 8) ( 3 4 16) ( 3 1 8) X1 = (18 15 11) X2 = (20 0 19) X2 - X1 = ( 2 11 8) ( 0 24 4) ( 8 14 13) ( 8 16 9) Using row operations, we find that the inverse of X2 - X1 is (15 3 10) (X2 - X1)^-1 = ( 4 3 14) (20 18 1) So, we can now solve for M: (15 3 10) (20 19 18) ( 3 6 4) M = ( 4 3 14) X (15 7 12) = ( 5 15 18) (20 18 1) (23 22 1) (17 8 5) b = (8 13 1). This can be derived by solving (3 18 17) = (0 3 8) ( 3 6 4) (b1 b2 b3) X ( 5 15 18) + (17 8 5) You can now go back and check to see that all 6 triplets do get encrypted correctly with these keys. 11) One method, especially if you have already written code to cryptanalyze Vigenere, is to take the given ciphertext, and produce a corresponding ciphertext that would be encrypted with straight Vigenere. But, you do now know what the keyword length is. Thus, what you must do is guess the length of the keyword and produce the corresponding straight Vigenere ciphertext under that assumption. The idea here is that if you guess the length of the keyword is 4, then you keep the first four characters as is, then subtract 1 from the value of then next four characters, then subtract 2 from the value of the subsequent four characters, and so on. Then, use the index of coincidence analysis on the new ciphertext produced, ONLY for the guess you made of the keyword length. If your guess was right, these values should be high. If not, that means that your guess at the keyword length was wrong. Using this method, the lengths 1, 2, 3, and 4 yield low ICs. But, with m=5, the ICs are very high, most of them are even above the expected .065 for English. So, from this point, we take our "new" ciphertext that should be the product of a straight Vigenere encryption. Using our old tools, we try to use the mutual index of coincidence to determine the relative shifts. However, in doing so, you get some conflicting evidence. The best you can do from this point is look at what you have and try out several different relative shifts. I ended up looking at each of the frequencies for the five groups of letters by hand and trying to match them up with the given probabilities in the text. (I assumed that common letters would have a frequency of at least 1 in each group, and that the real common letters would have fairly high frequencies. So, I'd try shifts, and then throw them out if I found that a particular shift meant there were 4 Qs or something strange like that.) Doing this analysis convinced me that the key for the second group of letters was 17 or R, and that the shift for the fourth group of letters was 12 or M. Plugging these into the first word, it made sense for the third letter in the first word of the plaintext to be a vowel. Trying out all 5 vowels as possibilities and double checking with the frequencies that the supposed key would give indicated that the key for the third group of letters was 8, or I. At this point, I realized that a reasonable keyword would be "PRIME". I just tried it out and it worked. Here is the corresponding plaintext: THE MOST FAMOUS CRYPTOLOGIST IN HISTORY OWES HIS FAME LESS TO WHAT HE DID THAN TO WHAT HE SAID-AND TO THE SENSATIONAL WAY IN WHICH HE SAID IT. AND THIS WAS MOST PERFECTLY IN CHARACTER, FOR HERBERT OSBORNE YARDLEY WAS, PERHAPS THE MOST ENGAGING, ARTICULATE, AND TECHNICOLORED PERSONALITY IN THE BUSINESS. Extra Problems -------------- a) This is true. Using the given information, we have the following: a = b + nx, for some integer x c = d + ny, for some integer y (You can get these equations by using the divisibility definition of mod followed by the definition of divisibility.) d = c - ny ad = (b + nx)(c - ny) = bc + nxc - nyb - nxyn = bc + n(cx - by - nxy) = bc (mod n) b) This is false. Consider the following counter example: 8 = 2 (mod 6) but (8/2) = (2/2) (mod 6) is false!!! c) Use the GCD algorithm as follows: 459 = 1*243 + 216 --> 216 = 459 - 1*243 243 = 1*216 + 27 --> 27 = 243 - 1*216 216 = 8*27 27 = 243 - (459 - 243) 27 = 2(243) - 459 -27 = (-2)243 + 459 A solution is x=-2, y=1. d) Since the determinant of the matrix is 69, and 69 is relatively prime to 35, this is a valid matrix to use for encryption in the Hill cipher. Now, let's get the inverse: 2 3 5 : 1 0 1 1 0 4 : 0 1 0 8 2 7 : 0 0 1 1 19 20 : 18 0 0 1 0 4 : 0 1 0 8 2 7 : 0 0 1 1 19 20 : 18 0 0 0 16 19 : 17 1 0 0 25 22 : 31 0 1 1 19 20 : 18 0 0 0 1 34 : 12 11 0 0 25 22 : 31 0 1 1 19 20 : 18 0 0 0 1 34 : 12 11 0 0 0 12 : 11 5 1 1 19 20 : 18 0 0 0 1 34 : 12 11 0 0 0 1 : 33 15 3 1 19 0 : 23 15 10 0 1 0 : 10 26 3 0 0 1 : 33 15 3 1 0 0 : 8 11 23 0 1 0 : 10 26 3 0 0 1 : 33 15 3 We can verify our answer as follows: (2 3 5) ( 8 11 23) (1 0 0) (1 0 4) X (10 26 3) = (0 1 0) (mod 35) (8 2 7) (33 15 3) (0 0 1) e) No. e(b) = 1^2 (mod 26) = 1 = b e(z) = 25^2 (mod 26) = 625 = 1 = b. Since two different characters encrypt to the same character, decryption is impossible and thus, this "quadratic" cryptosystem is invalid. f) When a=1, we are doing a shift cipher. If b=0, all 26 letters are fixed points. When b>0, there are no fixed points. When a>1, we need to find the number of solutions to the equation x = ax + b mod 26 -b = ax - x mod 26 -b = (a-1)x mod 26 We know a must be odd for the affine cipher to be valid. Thus, a-1 is even. If a-1 is even, then (a-1)x must be as well. We have the following: (a-1)x = 26n + r, for some integers n and r, 0 <= r < 26. 2m = 26n + r, for some integer m since a-1 is even. r = 2m - 26n r = 2(m-13n). We find that in order for there to be a solution to the equation above, b must be divisible by 2. We can deduce that when b is odd, there are no fixed points. Now consider the case where b is even. We are looking for solutions to -b = (a-1)x mod 26 Let x1 be a solution to this equation. Thus, we have -b = (a-1)(x1) mod 26 (a-1)(x1) + b = 26n, for some integer n. Since a-1 is even let 2a' = a - 1, for some integer a': 2a'(x1) + b = 26n 2a'(x1+13) + b = 2a'(x1) + 26a' + b = (2a'(x1) + b) + 26a' = 26n + 26a' = 26(n + a') So from this we see that if x1 is a solution, so is x1+13. Now, I will show that if x1 is a solution, then x1+p is not a solution if p is not divisible by 13: Assume to the contrary that 2a'(x1) + b = 26n , and 2a'(x1+p) + b = 26m, for some integer m. 2a'(x1+p) + b = 2a'(x1) + 2a'p + b = (2a'(x1) + b) + 2a'p = 26n + 2a'p For this last quantity to be divisible by 26, we must have 13 be a factor of it. Since 13 is prime, we must have 13 | a' or 13 | p. We have assumed that p is not divisible by 13, so that only leaves 13 | a'. But, if this is the case, then we have : a - 1 = 2a' a - 1 = 2(13a"), for some integer a". a = 1 + 26a" a = 1 mod 26 But remember, in this particular case, we assumed that a > 1 mod 26. That means that it is impossible for x1+p to also be a solution if x1 is. So, what we have established so far is when a > 1 and b is even, there will either be 0 or 2 solutions. (Two because if there is one solution x1, the other solution will be x1+13.) Now, we must prove the existence of a solution for each case. I claim that for the values x=0, 1, 2, ..., 12, the value of (a-1)x mod 26 will be 0, 2, 4, ... , 24, not necessarily respectively. We know that the value must be even. Now it must be shown that for each of these 13 values of x, we must get distinct values of (a-1)x. Assume to the contrary, that there is no solution. That means that two of the above values of x must give rise to the same value of (a-1)x mod 26. Let these values be x1 and x2, with x1 > x2: (a-1)x1 = (a-1)x2 mod 26 (a-1)(x1-x2) = 0 mod 26 Thus we have 13 | (a-1) or 13 | (x1-x2). As shown before, the first case is impossible as is the second case because the minimum value of x1 - x2 is 1 and the maximum value is 12. 13 can not divide this quantity. Thus, we can conclude that two distinct solutions x1 and x2 to the equation above do not exist. This means that (a-1)x mod 26 assumes all possible even values as x ranges from 0 to 12. Since we have assumed b to be even, -b = (a-1)x mod 26 must have a solution since -b is also even, mod 26. Since it has one solution, it must have two. Thus, we can characterize the number of fixed points as follows: # of fixed points Values of a and b ----------------- ----------------- 0 any time b is odd, or a=1 and b>0. 2 any time a>1 and b is even 26 a=1,b=0. Average number of fixed points = (0*(25+11*13) + 2*(11*13) + 26*1)/312 = (0 + 286 + 26)/312 = 1.