From: Anton Stiglic To: Adam Back Cc: Mads Rasmussen Subject: RE : RE : Collision in SHA-0 and stuff on SHA-1 Date: Mon, 16 Aug 2004 10:29:41 -0400 O.k., you attack was different than the one I implemented against MD5. Here is the email message you posted (including instructions to run the code, it was for SHA1), with replies including explanation of the attack that I implemented. I found it was a very intersting discussion (I'm copying Mads on this since he is intersted in the subject as well)... From: Adam Back Date: Wed May 28, 2003 4:26 am Subject: can we get beyond 16 round reduced-round SHA1 collision? So looking more at SHA1's internal's with a view to examining what it would take to construct a collision... You'll recall from earlier discussion SHA1's internals look like for hash of y1 || y2 (presuming the SHA1 padding has already been done): c1 = E_y1(IV)+IV h = E_y2(c1)+c1 etc. So as previously discussed an interesting value to find would be an x st: E_x(IV) = 0^n where n is the block size (160 bits), because that would allow construcion of SHA1 hash collisions (arbitrary a and b st SHA1(a) == SHA1(b) ), and 2nd pre-image collisions (given a find SHA1(a) == SHA1(b)) as these would simply be, respectively SHA1(c) == SHA1(x||c) == SHA1(x||x||c) etc. for all |c|>512 (to avoid padding issues); and SHA1(a) == SHA1(x||a) == SHA1(x||x||a) etc for all |a|>512 (again to avoid padding issues). So if you look at SHA1 internals x is expanded in a key-schedule like operation up to 80 32-bit words from the initial 16 32-bit words. So if we look at x as 16 32-bit words x = x0||...||x15 So if we adaptively choose x1...x16 we can trivially make the internal state go to 0^160 and keep it there until round 15 (for a real x facilitating collision we need to expand that to round 79)... (Note this is not an interesting observation so far.. it only gets interesting after round 15). (Rounds numbered 0..79 to be consistent with SHA1 test vector format and code). So I put the software I've been using to tinker with this here: http://www.cypherspace.org/adam/sha1int/ If you run the sha1try program it will create x values which set A (first 32 bit word of 160 bit state composed of the 5 32-bit words (A,B,C,D,E)) to 0 for rounds between s and f when called like this: % sha1try s f this then has the knock-on effect of setting B,C,D,E to 0 in subsequent rounds as the round function is called. For example to set as many of the A,...,E values in rounds 0..15 to 0^n as possible call it like this: % sha1try 0 15 0 604B674D 1 994F32F3 2 90CF3E87 3 CFB8D2C5 4 4BAC3DA7 5 A57D8667 6 A57D8667 7 A57D8667 8 A57D8667 9 A57D8667 10 A57D8667 11 A57D8667 12 A57D8667 13 A57D8667 14 A57D8667 15 A57D8667 So this is the input block. Note the regularity after round 4. This is because at this point A,...,E are all 0^32 and the round function is much simpler with all 5 inputs set to 0^32. To view the effect of this: % sha1try 0 15 | awk '{print $2}' | dehex32 | sha1 |& less The hex values of A,B,C,D,E after pass t of the "for t = 0 to 79" loop (step (d) of Section 7 or step (c) of Section 8) are A B C D E t = 0: 00000000 67452301 7BF36AE2 98BADCFE 10325476 t = 1: 00000000 00000000 59D148C0 7BF36AE2 98BADCFE t = 2: 00000000 00000000 00000000 59D148C0 7BF36AE2 t = 3: 00000000 00000000 00000000 00000000 59D148C0 t = 4: 00000000 00000000 00000000 00000000 00000000 t = 5: 00000000 00000000 00000000 00000000 00000000 t = 6: 00000000 00000000 00000000 00000000 00000000 t = 7: 00000000 00000000 00000000 00000000 00000000 t = 8: 00000000 00000000 00000000 00000000 00000000 t = 9: 00000000 00000000 00000000 00000000 00000000 t = 10: 00000000 00000000 00000000 00000000 00000000 t = 11: 00000000 00000000 00000000 00000000 00000000 t = 12: 00000000 00000000 00000000 00000000 00000000 t = 13: 00000000 00000000 00000000 00000000 00000000 t = 14: 00000000 00000000 00000000 00000000 00000000 t = 15: 00000000 00000000 00000000 00000000 00000000 t = 16: 3B8B2D2E 00000000 00000000 00000000 00000000 t = 17: 79D7DFCC 3B8B2D2E 00000000 00000000 00000000 t = 18: 4C447969 79D7DFCC 8EE2CB4B 00000000 00000000 t = 19: 493534AA 4C447969 1E75F7F3 8EE2CB4B 00000000 t = 20: 3EDBC252 493534AA 53111E5A 1E75F7F3 8EE2CB4B [...] t = 79: E482557D 43EDB42F 54762B51 C5900DAE EB913AA4. so you can see the 0s take some time to propagate out even after round 15 because the round function is less effective with some inputs set to 0^32. So of course to get round 16 it becomes "interesting" because we can't independelty choose x0||...||x15 as x16 is computed based on it using the "key schedule" recurrence relation: 16 <= i <= 79 Xi = (X_{i-3} ^ X_{i-8} ^ X_{i-14} ^ X_{i-16}) <<< 1 (all where ^ is xor, +, - are addition and subtraction (outside of subscripts) are mod 2^32, and <<<, >>> are rotate left and right respectively by n bits.) So I was able to manually adjust inputs with the help of the sha1try program to set A16 to 0^32 (numbering (Ai,Bi,Ci,Di,Ei) to correspond with the round number 0..79). And it seems that generally there is a set of simultaneous equations defined by the SHA1 function up to round k which if a single solution is found for give an x such that the (Ak,Bk,Ck,Dk,Ek) = (0^32,0^32,0^32,0^32,0^32). Now the supposition with SHA1 is that be hard to find a single solution to this set of equations for k=79. But I wonder how far one can get. One aspect that is inconvenient is the SHA1 round functions and key-schedule recurrence relation involve a number of different operations: <<<, >>>, ^ | & ! and + on 32-bit words. Equation solving usually doesn't involve these operations. One approach might be to define SHA1 in terms of bit operations where Ak_j is the j-th bit of value A on round k, and xk_j is the j-th bit of input xk and include all of the 32 bit carries, rotations, xor, or, and negations in that overall equation set. This would be a fairly complex equation set to be sure, but might be more amenable to input into equation solving software. I'll see if I can re-create my A16 = 0^32 input and post separately. Adam ----- Original Message ----- From: "Adam Back" To: "KDF" Cc: "Adam Back" Sent: Tuesday, May 27, 2003 11:26 PM Subject: [KDF-list] can we get beyond 16 round reduced-round SHA1 collision? > So looking more at SHA1's internal's with a view to examining what it > would take to construct a collision... > > You'll recall from earlier discussion SHA1's internals look like for > hash of y1 || y2 (presuming the SHA1 padding has already been done): > > c1 = E_y1(IV)+IV > h = E_y2(c1)+c1 > > etc. > > So as previously discussed an interesting value to find would be an x > st: > > E_x(IV) = 0^n > > where n is the block size (160 bits), because that would allow > construcion of SHA1 hash collisions (arbitrary a and b st SHA1(a) == > SHA1(b) ), and 2nd pre-image collisions (given a find SHA1(a) == > SHA1(b)) as these would simply be, respectively SHA1(c) == SHA1(x||c) > == SHA1(x||x||c) etc. for all |c|>512 (to avoid padding issues); and > SHA1(a) == SHA1(x||a) == SHA1(x||x||a) etc for all |a|>512 (again to > avoid padding issues). The fact that SHA1 encodes the length of the input that is hashed in the last 64 bits (Merkle-Damgard stengthening) will prevent you from successfully using this trick to find collisions with the whole SHA1. SHA1(c) != SHA1(x || c) because of this, even if E_x(IV) = 000...0 I think it would still be interesting to find such an x however... I was thinking about the link between finding collisions in the compression function and demonstrating that the compression function doesn't act like a PRF. For example, collisions with random IV have been found for the MD5 compression function. Dobbertin found an arbitrary IV', and x1 and x2, such that E_x1(IV') + IV' = E_x2(IV') + IV' This is not enough to say that E_x(K) + K, for a random secret K, doesn't act random! In fact, demonstrating that this last construct is not a PRF seems to be harder from this point of view, since you don't have control over what K is (and you don't know what it is). Take even the broken MD4 hash function. What Dobbertin found was a collision in the compression function for the fixed IV defined in MD4. In fact his attack works for any IV you choose. He showed that one can easily find x1 and x2 such that E_x1(IV) + IV = E_x2(IV) + IV And this can be extended to finding collisions in MD4 as a whole, even with the fact that MD4 uses MD-hardening. But this is not enough to say that MD4 collision function doesn't act like a PRF, since the attack assumes that the attacker *knows* the value of IV (and for PRF the IV will be random and *secret*). I'm not familiar enough with Dobbertin's attack to know if his techniques can be used in some way to also demonstrate that MD4 compression function is not a PRF. --Anton From: Adam Back Date: Wed May 28, 2003 10:44 pm Subject: Re: [KDF-list] can we get beyond 16 round reduced-round SHA1 collision? On Wed, May 28, 2003 at 10:55:52AM -0400, Anton Stiglic wrote: > [...] > The fact that SHA1 encodes the length of the input that is hashed in > the last 64 bits (Merkle-Damgard stengthening) will prevent you from > successfully using this trick to find collisions with the whole > SHA1. SHA1(c) != SHA1(x || c) because of this, even if E_x(IV) = > 000...0 Yes, braino on my part (focussing on internals didn't think too carefully about applicability to full hash). You're right, finding a single fixed point of the compression function starting from IV is insufficient to generate collisions at the hash level. Finding x st E_x(IV) = 0^160, and the compression function is F_x(y) = E_x(y)+y so finding a key x such that the encryption of IV under x gives the 0^160 ciphertext corresponds to an interesting fixed point in the compression function. > I think it would still be interesting to find such an x however... Yes. I think the main reason it would be practically interesting would be if the same approach allows computing multiple alternate x values such that E_x(IV) = 0; if we had E_x(IV) = 0 and E_x'(IV) = 0, then we could compute full hash collisions on strings such as: H(x||a) == H(x'||a) and: H(x||x||a) == H(x||x'||a) == H(x'||x||a) == H(x'||x'||a) So this type of pair of fixed points starting from the IV is interesting because even if it takes a huge amount of computation to achieve once achieved it would make SHA1 no longer practically collision free. The collisions have to have particular forms though, so the practical implications may not be as directly interesting unless you can manipulate protocols involving hashes of data to work with x or x'. It seems likely that an algorithm that could create one IV start fixed point could be run again with a different input to create a second IV start fixed point. (And also that the same algorithm could be used to find other chosen fixed points, presuming there is nothing special about the IV). As you say finding a single (arbitrary, non IV) fixed point has lesser practical uses. Adam From: John Kelsey Date: Wed May 28, 2003 9:46 pm Subject: Re: [KDF-list] can we get beyond 16 round reduced-round SHA1 collision? At 10:55 AM 5/28/03 -0400, Anton Stiglic wrote: ... >I was thinking about the link between finding collisions in the compression >function and demonstrating that the compression function doesn't act >like a PRF. Yep, this is kind of tricky. PRFs and hash functions are very different animals. It's not that one is stronger than the other, it's that they have really different requirements. >For example, collisions with random IV have been found for the MD5 >compression function. Dobbertin found an arbitrary IV', and x1 and x2, such >that E_x1(IV') + IV' = E_x2(IV') + IV' > >This is not enough to say that E_x(K) + K, for a random secret K, doesn't >act random! In fact, demonstrating that this last construct is not a PRF >seems to be harder from this point of view, since you don't have control >over what K is (and you don't know what it is). Yep. One thing that makes hash functions hard to think about in the PRF model is that there is really no secret key anywhere. The attacker knows exactly as much as the legitimate users. For example, suppose I specify a PRF as follows: PRF(k,x) = AES_k(x) This is a reasonable PRF, and will generally pass tests up to about 2^{64}. But it wouldn't be worth anything at all as a hash function! And similarly, it's possible to imagine many collision-resistant hash functions with all kinds of yucky properties that would ruin them as PRFs. As a simple example, consider compress(C,M) = SHA1_Compress(C,M) || parity(M) where || is bit concatenation. This is trivial to distinguish from a random function, but is still a good collision resistant function. Finally, you get what people expect from hash functions, which is usually some kind of random oracle behavior that's really unattainable from a deterministic function. (I suspect the best that's possible is that the compression function has the property that there's no way to distinguish its output from a random 160-bit number that's much cheaper than computing the compression function on the inputs, guessing any unknown inputs along the way. I've never heard of anyone using that as a model for describing hash functions, but it's intuitively pleasing to me, at least.) >--Anton --John Kelsey, kelsey.j@ix.netcom.com PGP: FA48 3237 9AD5 30AC EEDD BBC8 2A80 6948 4CAA F259 From: "Anton Stiglic" Date: Wed May 28, 2003 7:55 pm Subject: Re: [KDF-list] can we get beyond 16 round reduced-round SHA1 collision? ----- Original Message ----- From: "Mads Rasmussen" To: Sent: Wednesday, May 28, 2003 2:34 PM Subject: RES: [KDF-list] can we get beyond 16 round reduced-round SHA1 collision? > > For example, collisions with random IV have been found for the MD5 > > compression function. Dobbertin found an arbitrary IV', and x1 and > x2, > > such > > that E_x1(IV') + IV' = E_x2(IV') + IV' > > Wasn't it just pseudo collisions? In the second phrase maybe I should have used the word *random* again, instead of arbitrary. This is the terminology that HAC uses for example (collision for random IV = pseudo collision). Essentially Dobbertin chose particular values of x1 and x2, and then computed an IV' (based on the values of x1 and x2 and other, random looking, stuff) that would give him what he wanted (the collision). He didn't have control over the choice of IV' that would make things work, thus the wording "random IV". If he could do the same for any *given* IV, then the attack could be extended to MD5 as a whole (as was the case for MD4). --Anton From: "Anton Stiglic" Date: Thu May 29, 2003 3:04 pm Subject: Re: [KDF-list] can we get beyond 16 round reduced-round SHA1 collision? A little remark about finding an x such that E_x(IV) = 000...0, and extending this to finding collisions in the whole hash function: In the description of SHA1, they define SHA1 to work only on inputs smaller than 2^64 bits. However, this limitation does not exist in the definition of MD5. For MD5, if the message is longer than 2^64 bits, the value of the length you will put at the end is b % 2^64, where b is the number of bits in he message. See [*] bellow. I found that funny when I first noticed that, but now I think I might know one of the reasons why the changed it that way. So if you could find x such that E_x(IV) = 00...0 for MD5, then you could extend this to a collision of theoretical interest. Indeed, for any c, you would have a collision with c and c' := x || x || ... || x || c where x is repeated 2^64 / 512 times. This is because c and c' will both be padded the same way (the 100...0 padding will be the same and the length that will be appended will be equal in both cases). I say this is a collision of theoretical interest, since c' would be pretty large, about 2 million terabytes large. It's also theoretical because the could have defined MD5 to not accept messages larger than 2^64, but they simply didn't and you can maybe abuse this fact. --Anton [*] FIPS 180-1 (http://www.itl.nist.gov/fipspubs/fip180-1.htm) describes SHA1 and states that the input length is to have bit-length less than 2^64 bits. RFC 3174 also states that the input is to be smaller than 2^64 (http://www.faqs.org/rfcs/rfc3174.html): When a message of any length < 2^64 bits is input, the SHA-1 produces a 160-bit output called a message digest. The padding that is described for SHA1 is as follows: The following specifies how this padding shall be performed. As a summary, a "1" followed by m "0"s followed by a 64-bit integer are appended to the end of the message to produce a padded message of length 512 * n. The 64-bit integer is l, the length of the original message Padding for MD4 is different, as described in RFC 1320 (http://theory.lcs.mit.edu/~cis/pubs/rivest/Rivest-MD4.txt) MD4 accepts an input of bit-length greater than 2^64 bits: A 64-bit representation of b (the length of the message before the padding bits were added) is appended to the result of the previous step. In the unlikely event that b is greater than 2^64, then only the low-order 64 bits of b are used. The difference is that here you do a mod 2^64 if the message bit-length is greater or equal to 2^64, with SHA1 you don't need to mention this since the message will always be smaller than 2^64. MD5 is similar to MD4 in this respect. From: John Kelsey Date: Wed May 28, 2003 9:32 pm Subject: Re: [KDF-list] can we get beyond 16 round reduced-round SHA1 collision? At 04:26 AM 5/28/03 +0100, Adam Back wrote: ... >So as previously discussed an interesting value to find would be an x >st: > >E_x(IV) = 0^n > >where n is the block size (160 bits), because that would allow >construcion of SHA1 hash collisions (arbitrary a and b st SHA1(a) == >SHA1(b) ), and 2nd pre-image collisions (given a find SHA1(a) == >SHA1(b)) as these would simply be, respectively SHA1(c) == SHA1(x||c) >== SHA1(x||x||c) etc. for all |c|>512 (to avoid padding issues); and >SHA1(a) == SHA1(x||a) == SHA1(x||x||a) etc for all |a|>512 (again to >avoid padding issues). This won't work quite the way you're describing it, because SHA1 also includes a length of message block at the end of the message. So, while the intermediate state of SHA1 after processing a or a||x will be the same, their final hashes will have to be different, because the two input messages aren't the same size. This seems closely related to the fixed points I talked about earlier. There, we could easily choose any message block X and find some chaining value C such that compress(C,X) = C by just computing C = D_X(0). Each message block has one of these values. If we could get to that chaining variable C, we'd be onto something, but we'd still have to deal with the message length at the end to actually find collisions on the hash function from it. Here (assuming I'm not missing something), you're talking about finding that kind of fixed point for a specific input chaining variable, so that you're given C and must find X such that F(C,X) =C. This looks pretty hard to do with SHA1, because the whole design is built around stopping just this kind of choosing-message-to-manipulate-the-output-value attack. ... >So if you look at SHA1 internals x is expanded in a key-schedule like >operation up to 80 32-bit words from the initial 16 32-bit words. > >So if we look at x as 16 32-bit words x = x0||...||x15 > >So if we adaptively choose x1...x16 we can trivially make the internal >state go to 0^160 and keep it there until round 15 (for a real x >facilitating collision we need to expand that to round 79)... (Note >this is not an interesting observation so far.. it only gets >interesting after round 15). (Rounds numbered 0..79 to be consistent >with SHA1 test vector format and code). Right. There are many different messages that will get to any given desired internal state for SHA1 after 16 rounds. The whole point of the SHA1 design is that you then have to deal with that overlap of the message words being used over and over again. The most succinct way I can think of to describe the goal of hash function design is that you're trying to give the attacker too many things to control at once. With MD4, the message is processed sequentially three times (in different word order each time), with MD5 it's processed sequentially four times (in different word order, and with different additive constants per word). The idea is clearly that the attacker will have too many different things to control to be able to choose or influence the output value. RIPEMD and RIPEMD160 process the message in two parallel lines: RIPEMD processes the whole message six times total, and RIPEMD160 does it (I think) eight times total. (That's from memory; I may be getting it wrong.) .. >One aspect that is inconvenient is the SHA1 round functions and >key-schedule recurrence relation involve a number of different >operations: <<<, >>>, ^ | & ! and + on 32-bit words. Equation solving >usually doesn't involve these operations. One approach might be to >define SHA1 in terms of bit operations where Ak_j is the j-th bit of >value A on round k, and xk_j is the j-th bit of input xk and include >all of the 32 bit carries, rotations, xor, or, and negations in that >overall equation set. This would be a fairly complex equation set to >be sure, but might be more amenable to input into equation solving >software. Have you looked at the Crypto paper of a couple years ago? They analyzed a simplified version of the original SHA (identical to SHA1 except without the rotates in the key schedule), and they started by analyzing variants with only XORs. --John Kelsey, kelsey.j@ix.netcom.com PGP: FA48 3237 9AD5 30AC EEDD BBC8 2A80 6948 4CAA F259 From: "Anton Stiglic" Date: Fri May 30, 2003 5:02 am Subject: Re: [KDF-list] can we get beyond 16 round reduced-round SHA1 collision? On getting 0^32, 0^32, 0^32, 0^32 in the furthest step... For MD4 and MD5 it's easy to get (0^32, 0^32, 0^32, 0^32) as the internal state at step 18. Mads sent me a thesis that explains very nicely how to compute the pre-image that will give you the state you want at step 18. So I wrote some code that would compute Xi's needed so that the state at step 18 is all 0's, for MD5. The output of the computation with these Xi's looks like this: IV 67452301 EFCDAB89 98BADCFE 10325476 i A B C D 0 A51FE774 EFCDAB89 98BADCFE 10325476 1 A51FE774 EFCDAB89 98BADCFE 219552DE 2 A51FE774 EFCDAB89 93F2AB50 219552DE 3 A51FE774 CEE87580 93F2AB50 219552DE 4 1775B61F CEE87580 93F2AB50 219552DE 5 1775B61F CEE87580 93F2AB50 0CDA451E 6 1775B61F CEE87580 58DC4A44 0CDA451E 7 1775B61F 01135483 58DC4A44 0CDA451E 8 E85B5E49 01135483 58DC4A44 0CDA451E 9 E85B5E49 01135483 58DC4A44 516888D4 10 E85B5E49 01135483 55D9BB42 516888D4 11 E85B5E49 787AAA00 55D9BB42 516888D4 12 49E1396F 787AAA00 55D9BB42 516888D4 13 49E1396F 787AAA00 55D9BB42 3FBF4CC0 14 49E1396F 787AAA00 D9A1A5AF 3FBF4CC0 15 49E1396F 00000000 D9A1A5AF 3FBF4CC0 16 00000000 00000000 D9A1A5AF 3FBF4CC0 17 00000000 00000000 D9A1A5AF 00000000 18 00000000 00000000 00000000 00000000 19 00000000 7AAE9B6C 00000000 00000000 20 4090A726 7AAE9B6C 00000000 00000000 21 4090A726 7AAE9B6C 00000000 25F0261F 22 4090A726 7AAE9B6C 06820BC9 25F0261F We can keep a 0 word up step 21. The code I have is a bit messy, but if anyone is interested I can send it. --Anton