From: Anton Stiglic 
To: Adam Back
Cc: Mads Rasmussen
Subject: RE : RE : Collision in SHA-0 and stuff on SHA-1
Date: Mon, 16 Aug 2004 10:29:41 -0400

O.k., you attack was different than the one I implemented against MD5.  Here
is the email message you posted (including instructions to run the code, it
was for SHA1), with replies including explanation of the attack that I
implemented.  I found it was a very intersting discussion (I'm copying Mads
on this since he is intersted in the subject as well)...

From:  Adam Back <adam@cypherspace.org> 
Date:  Wed May 28, 2003  4:26 am 
Subject:  can we get beyond 16 round reduced-round SHA1 collision?
 
So looking more at SHA1's internal's with a view to examining what it
would take to construct a collision...

You'll recall from earlier discussion SHA1's internals look like for
hash of y1 || y2 (presuming the SHA1 padding has already been done):

c1 = E_y1(IV)+IV
h = E_y2(c1)+c1

etc.

So as previously discussed an interesting value to find would be an x
st:

E_x(IV) = 0^n

where n is the block size (160 bits), because that would allow
construcion of SHA1 hash collisions (arbitrary a and b st SHA1(a) ==
SHA1(b) ), and 2nd pre-image collisions (given a find SHA1(a) ==
SHA1(b)) as these would simply be, respectively SHA1(c) == SHA1(x||c)
== SHA1(x||x||c) etc. for all |c|>512 (to avoid padding issues); and
SHA1(a) == SHA1(x||a) == SHA1(x||x||a) etc for all |a|>512 (again to
avoid padding issues).


So if you look at SHA1 internals x is expanded in a key-schedule like
operation up to 80 32-bit words from the initial 16 32-bit words.

So if we look at x as 16 32-bit words x = x0||...||x15

So if we adaptively choose x1...x16 we can trivially make the internal
state go to 0^160 and keep it there until round 15 (for a real x
facilitating collision we need to expand that to round 79)... (Note
this is not an interesting observation so far.. it only gets
interesting after round 15). (Rounds numbered 0..79 to be consistent
with SHA1 test vector format and code).

So I put the software I've been using to tinker with this here:

http://www.cypherspace.org/adam/sha1int/

If you run the sha1try program it will create x values which set A
(first 32 bit word of 160 bit state composed of the 5 32-bit words
(A,B,C,D,E)) to 0 for rounds between s and f when called like this:

% sha1try s f

this then has the knock-on effect of setting B,C,D,E to 0 in
subsequent rounds as the round function is called.

For example to set as many of the A,...,E values in rounds 0..15 to
0^n as possible call it like this:

% sha1try 0 15
0 604B674D
1 994F32F3
2 90CF3E87
3 CFB8D2C5
4 4BAC3DA7
5 A57D8667
6 A57D8667
7 A57D8667
8 A57D8667
9 A57D8667
10 A57D8667
11 A57D8667
12 A57D8667
13 A57D8667
14 A57D8667
15 A57D8667

So this is the input block. Note the regularity after round 4. This
is because at this point A,...,E are all 0^32 and the round function
is much simpler with all 5 inputs set to 0^32.

To view the effect of this:

% sha1try 0 15 | awk '{print $2}' | dehex32 | sha1 |& less
The hex values of A,B,C,D,E after pass t of the "for t = 0 to 79" loop (step
(d) of Section 7 or step (c) of Section 8) are

A B C D E

t = 0: 00000000 67452301 7BF36AE2 98BADCFE 10325476
t = 1: 00000000 00000000 59D148C0 7BF36AE2 98BADCFE
t = 2: 00000000 00000000 00000000 59D148C0 7BF36AE2
t = 3: 00000000 00000000 00000000 00000000 59D148C0
t = 4: 00000000 00000000 00000000 00000000 00000000
t = 5: 00000000 00000000 00000000 00000000 00000000
t = 6: 00000000 00000000 00000000 00000000 00000000
t = 7: 00000000 00000000 00000000 00000000 00000000
t = 8: 00000000 00000000 00000000 00000000 00000000
t = 9: 00000000 00000000 00000000 00000000 00000000
t = 10: 00000000 00000000 00000000 00000000 00000000
t = 11: 00000000 00000000 00000000 00000000 00000000
t = 12: 00000000 00000000 00000000 00000000 00000000
t = 13: 00000000 00000000 00000000 00000000 00000000
t = 14: 00000000 00000000 00000000 00000000 00000000
t = 15: 00000000 00000000 00000000 00000000 00000000
t = 16: 3B8B2D2E 00000000 00000000 00000000 00000000
t = 17: 79D7DFCC 3B8B2D2E 00000000 00000000 00000000
t = 18: 4C447969 79D7DFCC 8EE2CB4B 00000000 00000000
t = 19: 493534AA 4C447969 1E75F7F3 8EE2CB4B 00000000
t = 20: 3EDBC252 493534AA 53111E5A 1E75F7F3 8EE2CB4B
[...]
t = 79: E482557D 43EDB42F 54762B51 C5900DAE EB913AA4.

so you can see the 0s take some time to propagate out even after round
15 because the round function is less effective with some inputs set
to 0^32.

So of course to get round 16 it becomes "interesting" because we can't
independelty choose x0||...||x15 as x16 is computed based on it using
the "key schedule" recurrence relation:

16 <= i <= 79 Xi = (X_{i-3} ^ X_{i-8} ^ X_{i-14} ^ X_{i-16}) <<< 1

(all where ^ is xor, +, - are addition and subtraction (outside of
subscripts) are mod 2^32, and <<<, >>> are rotate left and right
respectively by n bits.)


So I was able to manually adjust inputs with the help of the sha1try
program to set A16 to 0^32 (numbering (Ai,Bi,Ci,Di,Ei) to correspond
with the round number 0..79).

And it seems that generally there is a set of simultaneous equations
defined by the SHA1 function up to round k which if a single solution
is found for give an x such that the (Ak,Bk,Ck,Dk,Ek) =
(0^32,0^32,0^32,0^32,0^32).

Now the supposition with SHA1 is that be hard to find a single
solution to this set of equations for k=79.

But I wonder how far one can get.

One aspect that is inconvenient is the SHA1 round functions and
key-schedule recurrence relation involve a number of different
operations: <<<, >>>, ^ | & ! and + on 32-bit words. Equation solving
usually doesn't involve these operations. One approach might be to
define SHA1 in terms of bit operations where Ak_j is the j-th bit of
value A on round k, and xk_j is the j-th bit of input xk and include
all of the 32 bit carries, rotations, xor, or, and negations in that
overall equation set. This would be a fairly complex equation set to
be sure, but might be more amenable to input into equation solving
software.

I'll see if I can re-create my A16 = 0^32 input and post separately.

Adam 


----- Original Message -----
From: "Adam Back" <adam@cypherspace.org>
To: "KDF" <KDF-list@yahoogroups.com>
Cc: "Adam Back" <adam@cypherspace.org>
Sent: Tuesday, May 27, 2003 11:26 PM
Subject: [KDF-list] can we get beyond 16 round reduced-round SHA1 collision?


> So looking more at SHA1's internal's with a view to examining what it
> would take to construct a collision...
>
> You'll recall from earlier discussion SHA1's internals look like for
> hash of y1 || y2 (presuming the SHA1 padding has already been done):
>
> c1 = E_y1(IV)+IV
> h = E_y2(c1)+c1
>
> etc.
>
> So as previously discussed an interesting value to find would be an x
> st:
>
> E_x(IV) = 0^n
>
> where n is the block size (160 bits), because that would allow
> construcion of SHA1 hash collisions (arbitrary a and b st SHA1(a) ==
> SHA1(b) ), and 2nd pre-image collisions (given a find SHA1(a) ==
> SHA1(b)) as these would simply be, respectively SHA1(c) == SHA1(x||c)
> == SHA1(x||x||c) etc. for all |c|>512 (to avoid padding issues); and
> SHA1(a) == SHA1(x||a) == SHA1(x||x||a) etc for all |a|>512 (again to
> avoid padding issues).

The fact that SHA1 encodes the length of the input that is hashed in the
last
64 bits (Merkle-Damgard stengthening) will prevent you from successfully
using this trick to find collisions with the whole SHA1.
SHA1(c) != SHA1(x || c) because of this, even if E_x(IV) = 000...0

I think it would still be interesting to find such an x however...

I was thinking about the link between finding collisions in the compression
function and demonstrating that the compression function doesn't act
like a PRF.

For example, collisions with random IV have been found for the MD5
compression function. Dobbertin found an arbitrary IV', and x1 and x2, such
that E_x1(IV') + IV' = E_x2(IV') + IV'

This is not enough to say that E_x(K) + K, for a random secret K, doesn't
act random! In fact, demonstrating that this last construct is not a PRF
seems to be harder from this point of view, since you don't have control
over what K is (and you don't know what it is).

Take even the broken MD4 hash function. What Dobbertin found was
a collision in the compression function for the fixed IV defined in MD4.
In fact his attack works for any IV you choose.
He showed that one can easily find x1 and x2 such that
E_x1(IV) + IV = E_x2(IV) + IV
And this can be extended to finding collisions in MD4 as a whole, even
with the fact that MD4 uses MD-hardening.

But this is not enough to say that MD4 collision function doesn't act
like a PRF, since the attack assumes that the attacker *knows* the value
of IV (and for PRF the IV will be random and *secret*).

I'm not familiar enough with Dobbertin's attack to know if his techniques
can be used in some way to also demonstrate that MD4 compression function
is not a PRF.

--Anton

From:  Adam Back <adam@cypherspace.org> 
Date:  Wed May 28, 2003  10:44 pm 
Subject:  Re: [KDF-list] can we get beyond 16 round reduced-round SHA1
collision?

 
On Wed, May 28, 2003 at 10:55:52AM -0400, Anton Stiglic wrote:
> [...]
> The fact that SHA1 encodes the length of the input that is hashed in
> the last 64 bits (Merkle-Damgard stengthening) will prevent you from
> successfully using this trick to find collisions with the whole
> SHA1. SHA1(c) != SHA1(x || c) because of this, even if E_x(IV) =
> 000...0

Yes, braino on my part (focussing on internals didn't think too
carefully about applicability to full hash).

You're right, finding a single fixed point of the compression function
starting from IV is insufficient to generate collisions at the hash
level.

Finding x st E_x(IV) = 0^160, and the compression function is F_x(y) =
E_x(y)+y so finding a key x such that the encryption of IV under x
gives the 0^160 ciphertext corresponds to an interesting fixed point
in the compression function.

> I think it would still be interesting to find such an x however...

Yes. I think the main reason it would be practically interesting
would be if the same approach allows computing multiple alternate x
values such that E_x(IV) = 0; if we had E_x(IV) = 0 and E_x'(IV) = 0,
then we could compute full hash collisions on strings such as:

H(x||a) == H(x'||a)

and:

H(x||x||a) == H(x||x'||a) == H(x'||x||a) == H(x'||x'||a)

So this type of pair of fixed points starting from the IV is
interesting because even if it takes a huge amount of computation to
achieve once achieved it would make SHA1 no longer practically
collision free. The collisions have to have particular forms though,
so the practical implications may not be as directly interesting
unless you can manipulate protocols involving hashes of data to work
with x or x'.

It seems likely that an algorithm that could create one IV start fixed
point could be run again with a different input to create a second IV
start fixed point.

(And also that the same algorithm could be used to find other chosen
fixed points, presuming there is nothing special about the IV).

As you say finding a single (arbitrary, non IV) fixed point has lesser
practical uses.

Adam
 

From:  John Kelsey <kelsey.j@ix.netcom.com> 
Date:  Wed May 28, 2003  9:46 pm 
Subject:  Re: [KDF-list] can we get beyond 16 round reduced-round SHA1
collision?

 
At 10:55 AM 5/28/03 -0400, Anton Stiglic wrote:
...
>I was thinking about the link between finding collisions in the compression
>function and demonstrating that the compression function doesn't act
>like a PRF.

Yep, this is kind of tricky. PRFs and hash functions are very different
animals. It's not that one is stronger than the other, it's that they have
really different requirements.

>For example, collisions with random IV have been found for the MD5
>compression function. Dobbertin found an arbitrary IV', and x1 and x2, such
>that E_x1(IV') + IV' = E_x2(IV') + IV'
>
>This is not enough to say that E_x(K) + K, for a random secret K, doesn't
>act random! In fact, demonstrating that this last construct is not a PRF
>seems to be harder from this point of view, since you don't have control
>over what K is (and you don't know what it is).

Yep. One thing that makes hash functions hard to think about in the PRF
model is that there is really no secret key anywhere. The attacker knows
exactly as much as the legitimate users.

For example, suppose I specify a PRF as follows:

PRF(k,x) = AES_k(x)

This is a reasonable PRF, and will generally pass tests up to about
2^{64}. But it wouldn't be worth anything at all as a hash function!

And similarly, it's possible to imagine many collision-resistant hash
functions with all kinds of yucky properties that would ruin them as
PRFs. As a simple example, consider

compress(C,M) = SHA1_Compress(C,M) || parity(M)

where || is bit concatenation. This is trivial to distinguish from a
random function, but is still a good collision resistant function.

Finally, you get what people expect from hash functions, which is usually
some kind of random oracle behavior that's really unattainable from a
deterministic function. (I suspect the best that's possible is that the
compression function has the property that there's no way to distinguish
its output from a random 160-bit number that's much cheaper than computing
the compression function on the inputs, guessing any unknown inputs along
the way. I've never heard of anyone using that as a model for describing
hash functions, but it's intuitively pleasing to me, at least.)

>--Anton

--John Kelsey, kelsey.j@ix.netcom.com
PGP: FA48 3237 9AD5 30AC EEDD BBC8 2A80 6948 4CAA F259


From:  "Anton Stiglic" <astiglic@okiok.com> 
Date:  Wed May 28, 2003  7:55 pm 
Subject:  Re: [KDF-list] can we get beyond 16 round reduced-round SHA1
collision?

----- Original Message -----
From: "Mads Rasmussen" <mads@opencs.com.br>
To: <KDF-list@yahoogroups.com>
Sent: Wednesday, May 28, 2003 2:34 PM
Subject: RES: [KDF-list] can we get beyond 16 round reduced-round SHA1
collision?


> > For example, collisions with random IV have been found for the MD5
> > compression function. Dobbertin found an arbitrary IV', and x1 and
> x2,
> > such
> > that E_x1(IV') + IV' = E_x2(IV') + IV'
>
> Wasn't it just pseudo collisions?

In the second phrase maybe I should have used the word *random* again,
instead
of arbitrary. This is the terminology that HAC uses for example
(collision for random IV = pseudo collision).

Essentially Dobbertin chose particular values of x1 and x2, and then
computed
an IV' (based on the values of x1 and x2 and other, random looking, stuff)
that
would give him what he wanted (the collision). He didn't have control over
the
choice of IV' that would make things work, thus the wording "random IV".

If he could do the same for any *given* IV, then the attack could be
extended
to MD5 as a whole (as was the case for MD4).

--Anton

 
From:  "Anton Stiglic" <astiglic@okiok.com> 
Date:  Thu May 29, 2003  3:04 pm 
Subject:  Re: [KDF-list] can we get beyond 16 round reduced-round SHA1
collision?

 
A little remark about finding an x such that E_x(IV) = 000...0, and
extending this to finding collisions in the whole hash function:

In the description of SHA1, they define SHA1 to work only on inputs
smaller than 2^64 bits. However, this limitation does not exist in the
definition of MD5. For MD5, if the message is longer than 2^64 bits,
the value of the length you will put at the end is b % 2^64, where b
is the number of bits in he message. See [*] bellow.

I found that funny when I first noticed that, but now I think I might
know one of the reasons why the changed it that way.

So if you could find x such that E_x(IV) = 00...0 for MD5, then
you could extend this to a collision of theoretical interest.
Indeed, for any c, you would have a collision with

c and c' := x || x || ... || x || c

where x is repeated 2^64 / 512 times.

This is because c and c' will both be padded the same way
(the 100...0 padding will be the same and the length that will
be appended will be equal in both cases).

I say this is a collision of theoretical interest, since c'
would be pretty large, about 2 million terabytes large.
It's also theoretical because the could have defined MD5 to
not accept messages larger than 2^64, but they simply didn't
and you can maybe abuse this fact.

--Anton

[*]
FIPS 180-1 (http://www.itl.nist.gov/fipspubs/fip180-1.htm)
describes SHA1 and states that the input length is to have bit-length less
than 2^64 bits.
RFC 3174 also states that the input is to be smaller than 2^64
(http://www.faqs.org/rfcs/rfc3174.html):

When a message of any length < 2^64 bits is input, the SHA-1
produces a 160-bit output called a message digest.

The padding that is described for SHA1 is as follows:
The following specifies how this padding shall be performed. As a
summary, a
"1" followed by m "0"s followed by a 64-bit integer are appended to the
end of
the message to produce a padded message of length 512 * n. The 64-bit
integer
is l, the length of the original message


Padding for MD4 is different, as described in RFC 1320
(http://theory.lcs.mit.edu/~cis/pubs/rivest/Rivest-MD4.txt)
MD4 accepts an input of bit-length greater than 2^64 bits:

A 64-bit representation of b (the length of the message before the
padding bits were added) is appended to the result of the previous
step. In the unlikely event that b is greater than 2^64, then only
the low-order 64 bits of b are used.

The difference is that here you do a mod 2^64 if the message bit-length is
greater or equal to 2^64, with SHA1 you don't need to mention this since
the message will always be smaller than 2^64.

MD5 is similar to MD4 in this respect.

 
From:  John Kelsey <kelsey.j@ix.netcom.com> 
Date:  Wed May 28, 2003  9:32 pm 
Subject:  Re: [KDF-list] can we get beyond 16 round reduced-round SHA1
collision?

 
At 04:26 AM 5/28/03 +0100, Adam Back wrote:
...
>So as previously discussed an interesting value to find would be an x
>st:
>
>E_x(IV) = 0^n
>
>where n is the block size (160 bits), because that would allow
>construcion of SHA1 hash collisions (arbitrary a and b st SHA1(a) ==
>SHA1(b) ), and 2nd pre-image collisions (given a find SHA1(a) ==
>SHA1(b)) as these would simply be, respectively SHA1(c) == SHA1(x||c)
>== SHA1(x||x||c) etc. for all |c|>512 (to avoid padding issues); and
>SHA1(a) == SHA1(x||a) == SHA1(x||x||a) etc for all |a|>512 (again to
>avoid padding issues).

This won't work quite the way you're describing it, because SHA1 also
includes a length of message block at the end of the message. So, while
the intermediate state of SHA1 after processing a or a||x will be the same,
their final hashes will have to be different, because the two input
messages aren't the same size. This seems closely related to the fixed
points I talked about earlier. There, we could easily choose any message
block X and find some chaining value C such that

compress(C,X) = C

by just computing C = D_X(0). Each message block has one of these values.

If we could get to that chaining variable C, we'd be onto something, but
we'd still have to deal with the message length at the end to actually find
collisions on the hash function from it.

Here (assuming I'm not missing something), you're talking about finding
that kind of fixed point for a specific input chaining variable, so that
you're given C and must find X such that F(C,X) =C. This looks pretty hard
to do with SHA1, because the whole design is built around stopping just
this kind of choosing-message-to-manipulate-the-output-value attack.
...
>So if you look at SHA1 internals x is expanded in a key-schedule like
>operation up to 80 32-bit words from the initial 16 32-bit words.
>
>So if we look at x as 16 32-bit words x = x0||...||x15
>
>So if we adaptively choose x1...x16 we can trivially make the internal
>state go to 0^160 and keep it there until round 15 (for a real x
>facilitating collision we need to expand that to round 79)... (Note
>this is not an interesting observation so far.. it only gets
>interesting after round 15). (Rounds numbered 0..79 to be consistent
>with SHA1 test vector format and code).

Right. There are many different messages that will get to any given
desired internal state for SHA1 after 16 rounds. The whole point of the
SHA1 design is that you then have to deal with that overlap of the message
words being used over and over again.

The most succinct way I can think of to describe the goal of hash function
design is that you're trying to give the attacker too many things to
control at once. With MD4, the message is processed sequentially three
times (in different word order each time), with MD5 it's processed
sequentially four times (in different word order, and with different
additive constants per word). The idea is clearly that the attacker will
have too many different things to control to be able to choose or influence
the output value. RIPEMD and RIPEMD160 process the message in two parallel
lines: RIPEMD processes the whole message six times total, and RIPEMD160
does it (I think) eight times total. (That's from memory; I may be getting
it wrong.)

..
>One aspect that is inconvenient is the SHA1 round functions and
>key-schedule recurrence relation involve a number of different
>operations: <<<, >>>, ^ | & ! and + on 32-bit words. Equation solving
>usually doesn't involve these operations. One approach might be to
>define SHA1 in terms of bit operations where Ak_j is the j-th bit of
>value A on round k, and xk_j is the j-th bit of input xk and include
>all of the 32 bit carries, rotations, xor, or, and negations in that
>overall equation set. This would be a fairly complex equation set to
>be sure, but might be more amenable to input into equation solving
>software.

Have you looked at the Crypto paper of a couple years ago? They analyzed a
simplified version of the original SHA (identical to SHA1 except without
the rotates in the key schedule), and they started by analyzing variants
with only XORs.

--John Kelsey, kelsey.j@ix.netcom.com
PGP: FA48 3237 9AD5 30AC EEDD BBC8 2A80 6948 4CAA F259

 
From:  "Anton Stiglic" <astiglic@okiok.com> 
Date:  Fri May 30, 2003  5:02 am 
Subject:  Re: [KDF-list] can we get beyond 16 round reduced-round SHA1
collision?

 
On getting 0^32, 0^32, 0^32, 0^32 in the furthest step...
For MD4 and MD5 it's easy to get (0^32, 0^32, 0^32, 0^32)
as the internal state at step 18.

Mads sent me a thesis that explains very nicely how to compute
the pre-image that will give you the state you want at step 18.

So I wrote some code that would compute Xi's needed so that
the state at step 18 is all 0's, for MD5.

The output of the computation with these Xi's looks like this:

IV 67452301 EFCDAB89 98BADCFE 10325476
i A B C D
0 A51FE774 EFCDAB89 98BADCFE 10325476
1 A51FE774 EFCDAB89 98BADCFE 219552DE
2 A51FE774 EFCDAB89 93F2AB50 219552DE
3 A51FE774 CEE87580 93F2AB50 219552DE
4 1775B61F CEE87580 93F2AB50 219552DE
5 1775B61F CEE87580 93F2AB50 0CDA451E
6 1775B61F CEE87580 58DC4A44 0CDA451E
7 1775B61F 01135483 58DC4A44 0CDA451E
8 E85B5E49 01135483 58DC4A44 0CDA451E
9 E85B5E49 01135483 58DC4A44 516888D4
10 E85B5E49 01135483 55D9BB42 516888D4
11 E85B5E49 787AAA00 55D9BB42 516888D4
12 49E1396F 787AAA00 55D9BB42 516888D4
13 49E1396F 787AAA00 55D9BB42 3FBF4CC0
14 49E1396F 787AAA00 D9A1A5AF 3FBF4CC0
15 49E1396F 00000000 D9A1A5AF 3FBF4CC0
16 00000000 00000000 D9A1A5AF 3FBF4CC0
17 00000000 00000000 D9A1A5AF 00000000
18 00000000 00000000 00000000 00000000
19 00000000 7AAE9B6C 00000000 00000000
20 4090A726 7AAE9B6C 00000000 00000000
21 4090A726 7AAE9B6C 00000000 25F0261F
22 4090A726 7AAE9B6C 06820BC9 25F0261F

We can keep a 0 word up step 21.

The code I have is a bit messy, but if anyone is interested
I can send it.

--Anton