The quest for the most diminutive munitions program

The very latest 3 line version, please consider it a personal challenge to shorten this in any way and contribute your hackery:

#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj $/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1 lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)

The previous version (as featured on many T-shirts, and one tattoo), and its explanation have been retained here

Hall of fame:

Ken Pizzini <ken@halcyon.com> lopped a cool 20 bytes off + 4 + 2 +2 = 28!
- using dc to convert $k to binary
- +4 for skipping one of the modulus ops in the new dc code
- +2 for some clever juggling to eliminate spaces
- plus contribution of beautiful -e redundancy stuff for new usage string
- movement of usage string to length$n||die...
- I kind of lost track of byte count accounting in shortening my last re-write (last shortenings were a joint effort - me & Ken) but suffice it to say, as far as perl & dc (he *wrote* GNU dc you know?) hacking goes: Ken rules!
- Ken also contributed some bytes in the last rewrite started by Jay Lorch
Jay Lorch <lorch@cs.berkeley.edu> just blew away the works after many months of stagnation...
- found 5 bytes by making use of x^0 = 1 to replace a conditional in the dc code
- then (major contrib this one) another 4 bytes by changing the way that the program splits up stdin into blocks just less than N, the RSA modulus. He uses base-N (ie base RSA modulus) instead of expanding to nearest whole byte as the previous version did. This allows the splitting into blocks to be done in dc. It also enabled a whole string of further optimisations which Jay, Ken (Pizzini) and myself spent a few enjoyable days discovering.
Joey Hess <joey@kitenet.net> greatly improved the accuracy of the three lines of perl claim
- with his contribution of using -i and the string $^I within the program. This enabled the move to a true 3 lines including the #!/bin/perl!!
Travis Kuhn <trav@clark.edu> knocked off another 14 bytes
- by eliminating ($k,$n)=@ARGV;
- months after every one else had run out of steam, and when I had long given up hope of finding more savings!
Chris Barker <barker@ling.rochester.edu> shaved off 8 bytes
- with a fiendishly clever improvment over hal's s/// improvement
- Chris was winning for the greatest dc code until the last twist with another 4 byte optimisation of mine which worked out better on Ken's version. Chris's was previously one byte short than Kens's
<jfriedl@omron.co.jp> shaved off 8 bytes
- use of -s and cunning $w-=2*$d parts
Adam Roach <exuadam@exu.ericsson.se> shaved off 4 more!
- with a clever manipulation of the $v and $w calcuations
- plus some clever dc stuff which led to his discovery of a compatibility problem
Hal Finney <hal@rain.org> shaved off 4 bytes
- clever optimisation of s/// expressions
Tom Phoenix <rootbeer@teleport.com> shaved off another 2 bytes
- replacement of "H$v" with H.$v in 2 occurences
Dov Grobgeld <dov@orbotech.co.il>
- shaved off another byte '0'x becomes 0 x
Volker Hetzer <volker.hetzer@informatik.tu-chemnitz.de> found a related HTML bug.

If you see any way to shorten the above please contact me:

Adam Back <adam@cypherspace.org>

Please note compatibility problems with dc command P.

(If you are wondering how the above savings add up and why the script doesn't have negative size by now it's because I have rewritten it (yet again) and a couple of them are no longer used. The new code uses dc to do just about everything except packing/unpacking the file, and converting to hex. The re-write saved another 7 bytes, as well as making the code about 4 times faster for old versions of dc and about the same speed for the newest GNU dc-1.03). Ken Pizzini, Chris Barker and Adam Roach all found some more optimisations on top of this. And then I found another 4 and Ken another 2, and so it goes on. Since that I re-wrote part of it again to fix a bug, which made it over 3 lines, slowly shortened that back to 3 lines (joint effort with Ken). Some months later, Jay Lorch found a 5 byte saving, and then a different way of blocking and unblocking to less than N the RSA modulus. The reorganisation and further compression with Jay's new RSA blocking method lead to a whole string of further optimisations by Jay, Ken and myself. Joey Hess's suggestion of using -i was also adopted which greatly improved the appearance, allowing perl (well dc) code to be placed after the #!/bin/perl.

#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj $/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1 lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)

The commented version

This used to be based on Hal Finney's reply to my original post on the cypherpunks list. However it has undergone so many revisions that little remains of the original.

Enjoy:-)

#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj # # usage: # # rsa -e -k=public-key -n=rsa-modulus < msg > msg.rsa # rsa -d -k=private-key -n=rsa-modulus < msg.rsa > msg.out # # some time ago Ken made -e redundant, by altering some perl # expressions (and saving some bytes at the same time) which made -e # (encrypt) the default behaviour: # # rsa -k=public-key -n=rsa-modulus < msg > msg.rsa # rsa -d -k=private-key -n=rsa-modulus < msg.rsa > msg.out # # now, both -d and -e are redundant! Encrypt and decrypt are the # same operation just with a different key with Jay's blocking method, # so the usage is now: # # rsa -k=public-key -n=rsa-modulus < msg > msg.rsa # rsa -k=private-key -n=rsa-modulus < msg.rsa > msg.out # # amusingly both versions have backwards compatible usage! (Earlier versions # did not use Travis Kuhn's hack (of omitting @ARGV=($k,$n), and using -x=exp # trick instead), and backwards compatibility was lost at that stage with # those old versions) # # also note that Jay's blocking method even though the usage is # backwards compatible, the encryted code is not, and they can not be # used to decrypt files encrypted with the other. # #!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj # # -s was contributed by Jeff Friedl a cool perl hacker # # the use of -k=key and -n=modulus where contributed by Travis Kuhn # (Jeff suggested -s to allow -d and -e only (-d and -e from the days when # -d and -e were used, and not redundant as they are now)). Travis # contribution relies on the fact that when you have -s set, the command line # flag -e=exp results in the perl variable $e being set to "exp". # # The use of -i was contributed by Joey Hess, this one is really rather a neat # trick. The trick is that the perl variable $^I is set to the string after # -i. You should be able to see the $^I used in the actual code lower down, # it is part of the dc command string. This actually makes the code slightly # longer if you neglect the comment (--export-a-crypto-system-sig...), but # it is a very important contribution because it allows code to placed after # #!/bin/perl, and makes the comment (now separated from the code) to be much # more accurate (especially important given the other modifications -p, -0777 # now encoding significant processing). # # -p and -0777 I added at the same time as the Joey's -i, -p saves a print # statement and was one of the things Jays new blocking method enabled; # -0777 is a more compact way of saying undef $/, or "no line processing", # ie $/ is the line separator, 777 is an undefined value which causes perl # to read the whole of standard input in one go. you won't see the $_=<> # (where <> is a synonym for <ARGV> which means read from pseudo file ARGV, # the file after the flags or if there is no file, standard input.) because # the -p does this automatically $/=unpack('H*',$_); # you will recall that we have given the -p flag, and the -0777 flag; # those two flags in combination ensure that $_ holds the whole of # standard input at this point. The above statement therefore # converts standard input into hex, and stores this in the variable # $/. The (re)use of $/ rather than using a normal, variable such as # $m (for message) was a nifty hack to save one byte by Jay! (Note # that the re-use is safe, the perl meaning for $/ is no longer # required as the program has already done all the input it is going # to: it has read the entire standard input into $_) The reason that # this saves a byte is that you can place $/ next to an alphanumeric # in a perl expression without the following alphanumeric being # considered part of the variable name. This saves a space here: # # ...16dio\U$k"SK$m SM$n... # ^ # ...16dio\U$k"SK$/SM$n... # # because $mSM makes perl interpret that as a variable named mSM, # ie as ${mSM}! $_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1 lK[d2%Sa2/d0$^Ixp"|dc`; # next a string is piped to dc (the string $^I is replaced with the # string of dc commands coming after the -i on the #! line as # explained above. # # expanding the $^I, the string passed in to dc is in full: # # 16dio\U$k"SK$/SM$n\EsN0p[lN*1 # lK[d2%Sa2/d0<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsjxp" # # this is where the business occurs, and in fact it is slightly unfair to # call this whole thing a perl program these days because nearly everything # has now migrated to dc, because of dc's compactness. # # The above expression (now with $^I expanded) is used in the expression: # # $_=`echo exp|dc` # # which results in $_ being set to the result of that string as a # series of dc commands. # # There are a number of stages of evaluation which that string goes # through before dc sees it (perl expression, and shell evaluation): # # Firstly what perl does to it, the perl variables are replaced with # their values, so that is $/ (the hex expanded message), $k (the key # given on the command line), $n (the rsa modulus given on the command # line). Next the \U...\E converts the hex numbers into uppercase # (required for dc, lowercase letters are dc commands). The "s are # passed to the shell (I think), and are needed to protect the line # break and the ^ and two < characters from interpretation as previous # command substitution (^) and input redirect (<) by the shell. # # The " between $k and SK is overloaded also ... an opening quote is # needed somewhere before the first <, it was placed between the $k # and SK to prevent perl interpreting that as a reference to # non-existant perl variable ${kSK}. This saves one more byte as a # space would otherwise be needed. Using Jay's trick of re-using # single char variables would not work in this instance because the # name of the variable k is determined by the fact that the command # line uses the form -k=key. (Unless you fancy typing # rsa -;=11 -n=ca1 instead of rsa -k=11 -n=ca1 that is!) # # So the form dc sees after this is (for particular values of # $/=414141 (encrypting input file `AAA' in ascii), $k=11 (RSA key # hex), $n=ca1 (modulus hex): # # 16dio11SK414141SMCA1sN0p[lN*1 # lK[d2%Sa2/d0<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsjxp # # (dc sees and doesn't care about the line feed, the "s were just # required (as far as the way the shell saw the linefeed) to prevent # the shell from seeing the line feed as the end of the command and # just echoing part of the program and then complaining of a malformed # second command. So (going back to the program string (before perl # and the shell have expanded things)): # # 16dio\U$k"SK$/SM$n\EsN0p[lN*1 # lK[d2%Sa2/d0<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsjxp" # # breaking down the dc commands... # # # 16dio # ask for hex input and ouput # # \U$k"SK # remember \U is start of upper case conversion done # # by perl, and that " is removed by the shell, this # # stores $k the key into dc register K, (more # # accurately it is pushed onto dc stack K with # # command S, because we are inside a perl upper case # # string area \U...\E, and even if we had written sk # # perl would have converted that to SK. This doesn't # # matter because K can be used with l as if it were # # a register # # $/SM # remember $/ is standard input in hex, use of $/ # # rather than a more normal variable name: $m, to save # # one byte as explained above. Again use of stack # # because this part of the string is converted by # # perl to uppercase. # # $n\EsN # end of upper case region (\E) and store $n (the # # rsa modulus given on the command line) into N. # # upper case N just used for consistency, lower case # # n could have been used as we are now outside the # # \U...\E bracketing. # # 0p # store 0 for later [1] # # and print a "0\n" for even later [2] # # the main recursively called procedure "j": # # [lN*1lK[d2%Sa2/d0<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj # # the command is of the form: # # [code-for-j]dsjxp # # which means store this string on the stack, dup (d) and store that # in register j, then execute (x) the copy left on the stack by the # dup, and finally, print the result (p). # # this form is used rather than the clearer: # # [code-for-j]sjljxp # # because it saves 1 byte. # # [lN*1lK[d2%Sa2/d0<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsjxp" # # function j takes the argument of the result so far. Perhaps a brief # explanation of Jay's blocking method should go here... but first a brief # RSA explanation... # # RSA encrypt is the operation: m ^ e mod N, that is m raised to the # power of e modulus N, or in perl notation (** = exponent, % = modulus): # # C = M ** e % N # # M is the plain text of the message (viewed as a number, that is the # actual message broken down into numbers less than N, the RSA # modulus), e is the public exponent, N is the RSA modulus, and C # denotes the `ciphertext' # # decrypting is the same operation with d, the private exponent (the # secret key), again in perl notation: # # M = C ** d % N # # note these will be large numbers, N in normal usage with RSA is # typically 1024 bits or greater for security. # # Because of the large numbers, it is necessary to use a more # efficient algorithm than the obvious tmp = (M ** e), then C = tmp % # N. If you didn't the algorithm would take forever (fairly # literally... we're talking years, perhaps hundreds of years for large keys). # # Knuth describes an efficient way of doing modular arithmetic, here # is some pseudo code describing it: # # $ans = 1; # $kbin = split(/./,unpack('B*',pack('H*',$k))); # for ($i=0; $i<$#kbin; i++) # { # $ans = $ans * $ans % $N; # if (substr($kbin,$i,$1) == 1) { $ans = $ans * $M % $N; } # } # return $ans; # # that would only work (not that I've tested it) for small values of # $N etc because things would overflow many times for N up to (or # greater than) 1024 bits. 1024 bits is 128 bytes, and on most # machines an int is 4 bytes. # # the perl code implements the above modular exponentiation algorithm. # # Now moving on to blocking methods, because the basic operation # (M ** k % N) relies on M the message being less than N, if the # message in question happens to be larger than N, it must be split up # into chunks < N. # # Multiple blocks are not often used in practical software applications # of RSA because pure RSA is not often used, more often RSA is used to # form a hybrid crypto system with a conventional cipher. This is the # way PGP works, the conventional cipher PGP uses is IDEA. In this role # RSA would be used only to encrypt a session key, which is usually # plently small enough to fit into a single block when the typical keys # sizes are used. (RSA is not considered secure for keysizes below 1024 # bits or there abouts). # # Because this program (talking about the perl implementation again) # also gets used for smaller keys for testing purposes (key sizes # below those which would not normally be used, and hence insecure) it # was necessary to allow multiple blocks. # # Also another couple of security notes: in those situations where # pure RSA is used, it is necessary to do chaining to reduce # vulnerability to attack. The perl program does not implement # chaining, the multiple blocks are just to keep things working for # small test numbers, the perl program is not intended to be used with # multiple blocks in anger because of the lack of chaining. (It is # not designed to be used in anger at all come to that, efficiency and # security have been traded for size) # # When RSA is used with single blocks to encrypt a session key, it is # necessary to pad the session key (the message) with random padding # to protect against another type of attack. The perl program does # not do this; but it does not prevent the user of the program doing # it either. For example, it is possible to create a PGP compatible # signing script by, combining this script with pgpacket, and with # John Allen's md5 in 8 lines of perl. # # The normal blocking method used is to take as many bytes as will # fit within N, that is floor(log256(N)) bytes for each block. This # creates some message expansion, and means that this blocking must be # undone on decryption. # # This is the blocking method used by all the previous versions of the # program. Jay changed all that (and saved many bytes at the same time!) # # One of Jay's contributions was to suggest the use of base N to split # the message up into blocks less than N. This is mathematically # simpler to express in dc, and so saves quite a few bytes. Naturally # the resulting program produces output incompatible with the previous # blocking method. # # OK, now back to the dc code, we were looking at function j: # # the function j is: # # [lN*1lK[d2%Sa2/d0<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsjxp" # # j expects as it's argument (ie left on the stack on entry) the # result so far. Jays' blocking method means that the first thing # that is done is to multiply the current value by N, and leave this # on the stack. Note that this means that the encrypted data blocks # will be in the reverse order to the plaintext blocks. This does not # matter though because they are reversed a second time on decryption, # and keeping this reversing behaviour saves a couple of bytes (these # bytes, and the idea of reversing blocks contributed by Jay also). # # function j broken down: # # lN* # multiply the result so far by N and leave # # on stack for later [3] # # 1 # leave 1 on the stack for later [4] # # lK # store K, the key on the stack # # at this point function X is called, this function converts the key into # binary, and performs Knuth's modular exponentiation algorithm, function X # is inserted inline in the code for function j, this saves a few bytes # over defining the function outside. # # function X is: # # [d2%Sa2/d0<X+d*lMLa^*lN%0] # # again the form [code-for-X]dsXx is used to save 1 byte in place of # [code-for-X]sXlXx # # breaking down function X: # # d2%Sa2/d0<X # modulus 2 operation and store result # # on stack a, at same time divide by 2 # # and if the remaining key being converted # # is greater than 0 call X recursively # # when the recursion reaches it's deepest # # a will hold the key in binary with the # # least significant bit on top of the stack # # + # eat a 0 which will be left after the # # recursion, we add a 0 to cater for this # # necessary eating of the first 0 at the end # # of function X below [5] # # d* # this calculates ans = ans * ans; # # the 1 stored above [4] is required to # # prime the stack with the start answer of 1 # # lMla^* # this optionally multiplies by M (depending # # on the value of the next bit of the key # # which is stored in binary on stack a) The # # use of the lMla^ was Jay's contribution, and # # saved 5 bytes. The way that it works is # # that Knuth's algorithm (as described above) # # requires that the ans should also # # be multiplied by M in cases where the next # # bit of the key viewed in binary is 1. # # The trick is that lMla^* multiplies by 1 # # (a nop) if the next bit is 0, and multiplies # # by the required M if the next bit is 1! # # lN% # modulus N # # 0 # 0 required to feed the 0 eaten above [5] # # and so ends the description of function X, the function is stored in # X, and executed by these commands # # dsXx # call X # continuing with function j... remembering the surplus 0 left by [5] # + # eat the 0 left by [5] # # exposing the previous block multiplied by N # # + # add the result of the current block # # the 0 pushed at the begining [1] is # # required to prime the current result to # # be 0 the first time through # # lMlN/dsM # discard the processed block (M = M / N) # # also the dup (d) leaves the value of M # # calculated on the stack for the following # # comparison # # 0<j # if greater than 0, call function j # # recursively to encrypt the next block # # and so ends the description of function j, the function is stored in # j, and executed by these commands # # dsjx # call j # # # p # print out result left on stack # # end of dc string! # # now back to perl with the result of the dc invocation left in $_ s/\W//g; # the above strips trailing "\\n"s from GNU dc style output # ie GNU dc output for extra long numbers looks something like this: # # 0ABCDAFEAFDA98DFBCA1341341234123413240981730498138904\ # BCA1341341234123413240981730498138904 # # the above removes the trailing \, and the newline char. # As a fun double use, the removal of whitespace (newlines) is also required # by the p [1] (which you will find here: # # $_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1 # ^ # ) # # this is required later for the following expression # # /((..)*)$/ # # it prints a 0 followed by a carriage return, then dc is invoked # and prints out the result as a large hex number (possibly with # trailing \s and broken over multiple lines if GNU dc is being used) # # the removal of whitespace therefore doubles to save some bytes from # the otherwise required printing of "0" or combining of "0" tacked # onto the front of the result from dc. $_=pack('H*',/((..)*)$/) # the /((..)*)$/ is required to account for odd numbers of hex digits, # it (together with the "0" printed by the p explained above [1]) ensures # that it is the leading hex digit which is packed with a "0" rather # than the perl pack default of the trailing digit. This is necessary # otherwise things don't decrypt as the encryption would be multiplied # by 16! # # the expression /((..)*)$/ (in case you were wondering) actually # evaluates to a list with two values in it, pack because it only has # one format specification only pays attention to the first element of # this list (fortunately, as this saves another byte).

Comments, html bugs to me (Adam Back) at <adam@cypherspace.org>