The script takes a single parameter which can be either the name of a file or a string. The hash/digest is written to the console and optionally stored in a variable, passed as the second parameter. First is the source code:
Code: Select all
:: This script implements the MD5 Message-Digest Algorithm in accordance with
:: RFC 1321, which can be found at http://www.ietf.org/rfc/rfc1321.txt.
:: The only known limitation is that files to be hashed by this script must
:: be less than 256 MB in size.
@echo off
SETLOCAL ENABLEDELAYEDEXPANSION
IF "%~1"=="" (
echo md5 ^<string/filename^> [^<variable^>]
echo Calculates the MD5 message-digest of the input string or file.
echo Prints the result to stdout and optionally stores it in 'variable'.
exit /b
)
set "inFile=%~1"
:: Binary integer parts of the sines of integers (in Radians)
set i=0
FOR %%a IN (d76aa478 e8c7b756 242070db c1bdceee f57c0faf 4787c62a a8304613 fd469501
698098d8 8b44f7af ffff5bb1 895cd7be 6b901122 fd987193 a679438e 49b40821
f61e2562 c040b340 265e5a51 e9b6c7aa d62f105d 02441453 d8a1e681 e7d3fbc8
21e1cde6 c33707d6 f4d50d87 455a14ed a9e3e905 fcefa3f8 676f02d9 8d2a4c8a
fffa3942 8771f681 6d9d6122 fde5380c a4beea44 4bdecfa9 f6bb4b60 bebfbc70
289b7ec6 eaa127fa d4ef3085 04881d05 d9d4d039 e6db99e5 1fa27cf8 c4ac5665
f4292244 432aff97 ab9423a7 fc93a039 655b59c3 8f0ccc92 ffeff47d 85845dd1
6fa87e4f fe2ce6e0 a3014314 4e0811a1 f7537e82 bd3af235 2ad7d2bb eb86d391
) DO set /a "Radians[!i!]=0x%%a, i+=1"
:: Per-round rotation amounts
set i=0
FOR %%a IN (7 12 17 22 7 12 17 22 7 12 17 22 7 12 17 22
5 9 14 20 5 9 14 20 5 9 14 20 5 9 14 20
4 11 16 23 4 11 16 23 4 11 16 23 4 11 16 23
6 10 15 21 6 10 15 21 6 10 15 21 6 10 15 21
) DO set /a "Shift[!i!]=%%a, i+=1"
:: Powers of 2 lookup table for custom right shift operation
:: Only includes the values we could possibly need, to keep ENV size down.
set /a Powers2[3]=8, Powers2[4]=16, Powers2[5]=32, Powers2[6]=64, Powers2[8]=256
set /a Powers2[9]=512, Powers2[10]=1024, Powers2[11]=2048, Powers2[13]=8192
set /a Powers2[14]=16384, Powers2[15]=32768, Powers2[16]=65536, Powers2[19]=524288
set /a Powers2[20]=1048576, Powers2[21]=2097152, Powers2[22]=4194304
:: Output digest initialization
set /a digA0=0x67452301, digB0=0xefcdab89, digC0=0x98badcfe, digD0=0x10325476
:: Map for dec->hex conversion
set "hexMap=0123456789ABCDEF"
:: If the input is not a valid filename, it is treated as a string to be hashed. The string is first written to a
:: temp file (without appending a carriage return), then the script recursively calls itself to hash that file.
IF NOT EXIST !inFile! (
set recursed=true
echo|set /p ans="%inFile%">"%temp%\_md5.tmp"
CALL %~f0 "%temp%\_md5.tmp" %~2
del "%temp%\_md5.tmp"
:: Pass the result back over the second ENDLOCAL barrier
IF NOT "%~2"=="" FOR %%a IN (!md5Digest!) DO ENDLOCAL& set %~2=%%a
exit /b
)
:: Abort with an error if the input file is larger than 268,435,455 bytes (see block padding description below)
IF %~Z1 GTR 268435455 (
echo ERROR: File size must be ^<= 268,435,455 bytes ^(256 MB^)
exit /b
)
:: Convert the input file into a hexadecimal representation and split it into 512-bit little endian blocks.
set "tempFile=%temp%\#.tmp"
del "%tempFile%" 2>NUL
fsutil file createnew "%tempFile%" %~Z1 >NUL
set /A i=0, d=0
set block=
set emptyfile=false
for /F "skip=1 tokens=1,2 delims=: " %%b in ('fc /B "%inFile%" "%tempFile%"') do (
IF NOT %%c==no (
set /A b=0x%%b
if !i! neq !b! (
set /A c=b-i
for /L %%i in (!c!,-1,1) do (
set "block=00!block!"
set /A d+=1
if !d! geq 64 (
set /a d=0
CALL:do_md5 !block!
set block=
)
)
)
set "block=%%c!block!"
set /A i=b+1, d+=1
if !d! geq 64 (
set /a d=0
CALL:do_md5 !block!
set block=
)
) ELSE set emptyfile=true
)
IF NOT !emptyfile!==true (
if !i! neq %~Z1 (
set /A c=%~Z1-i
for /L %%i in (!c!,-1,1) do (
set "block=00!block!"
set /A d+=1
if !d! geq 64 (
set /a d=0
CALL:do_md5 !block!
set block=
)
)
)
)
:: Some portion of a 512-bit block will be leftover and needs to be padded before processing.
:: First pad the block with a '1' bit and seven '0' bits (0x80)
set "block=80!block!"
set /a d+=1
if !d! geq 64 (
set /a d=0
CALL:do_md5 !block!
set block=
)
:: If the block is now larger than 448 bits, it needs to be zero-padded to 512, hashed, and
:: a second empty block created to hold the length field.
IF !d! GTR 56 (
FOR /L %%a IN (!d!,1,63) DO set "block=00!block!"
set /a d=0
CALL:do_md5 !block!
set block=
)
:: Zero-pad the remaining block 4 bits at a time until the total length is 448 bits (512-64)
set /a lenBits=!d!*8
FOR /L %%a IN (!lenBits!,4,444) DO set "block=0!block!"
:: The final 64 bits of padding are set equal to the original size in bits of the input.
:: Since cmd only handles 32-bit numbers, we assume the upper 32-bits are zero and the size
:: is stored in the lower 32 bits only. This supports files up to ~256 MB, much larger than
:: anyone with any sense would ever use this script on.
set /a dec=%~Z1*8
set "hex="
FOR /L %%N IN (1,1,8) do (
set /a "d=dec&15,dec>>=4"
FOR %%D in (!d!) do set "hex=!hexmap:~%%D,1!!hex!"
)
:: Hash the final block
CALL:do_md5 00000000!hex!!block!
:: Convert the MD5 digest chunks to hex
FOR %%a IN (digA0 digB0 digC0 digD0) DO (
set /a dec=!%%a!
set "hex="
FOR /L %%N IN (1,1,8) do (
set /a "d=dec&15,dec>>=4"
FOR %%D in (!d!) do set "%%ahex=!hexmap:~%%D,1!!%%ahex!"
)
)
:: Rearrange the hex to create the little endian output value
set md5Digest=
FOR %%a IN (digA0hex digB0hex digC0hex digD0hex) DO (
FOR /L %%b IN (6,-2,0) DO (
set thisByte=!%%a:~%%b,2!
set "md5Digest=!md5Digest!!thisByte!"
)
)
echo !md5Digest!
del "%temp%\#.tmp"
IF NOT "%~2"=="" IF NOT !recursed!==true (ENDLOCAL& set %~2=%md5Digest%) ELSE (ENDLOCAL& set md5Digest=%md5Digest%)
exit /b
:do_md5
set data=%~1
:: Break the input block into sixteen 32-bit words
FOR /L %%a IN (0,1,15) DO (
set /a "startPos=(15-%%a)*8"
FOR %%b IN (!startPos!) DO set M[%%a]=!data:~%%b,8!
)
:: Initialize this chunk's digest to the running digest value
set /a digA'=!digA0!, digB'=!digB0!, digC'=!digC0!, digD'=!digD0!
:: Main md5 loop
FOR /L %%i IN (0,1,63) DO (
IF %%i LEQ 15 (
set /a "F=!digD'! ^^ (!digB'! & (!digC'! ^^ !digD'!))"
set /a g=%%i
) ELSE IF %%i LEQ 31 (
set /a "F=!digC'! ^^ (!digD'! & (!digB'! ^^ !digC'!))"
set /a "g=(5*%%i+1)%%16"
) ELSE IF %%i LEQ 47 (
set /a "F=!digB'! ^^ !digC'! ^^ !digD'!"
set /a "g=(3*%%i+5)%%16"
) ELSE IF %%i LEQ 63 (
set /a "F=!digC'! ^^ (!digB'! | (!digD'! ^^ 0xFFFFFFFF))"
set /a "g=(7*%%i)%%16"
)
set /a tempD=!digD'!
set /a digD'=!digC'!, digC'=!digB'!
FOR %%g IN (!g!) DO set /a "msg=0x!M[%%g]!"
set /a X=!digA'!+!F!+!Radians[%%i]!+!msg!
:: Because cmd does an arithmetic (sign-aware) right shift instead of a logical right shift^
:: 'Negative' numbers will cause incorrect shift results. This implementation performs a proper^
:: logical right shift on negative numbers by removing the negative sign bit and putting it back^
:: in the proper position after the shift is completed.
IF !X! LSS 0 (
set /a "tempX=!X! & 0x7FFFFFFF"
set /a rShiftAmt=32-!Shift[%%i]!
set /a "tempB=!tempX! >> !rShiftAmt!"
FOR %%j IN (!rShiftAmt!) DO set /a power=31-%%j
FOR %%k IN (!power!) DO set /a tempB+=!Powers2[%%k]!
set /a "digB'+=((!X! << !Shift[%%i]!) | !tempB!)"
) ELSE set /a "digB'+=((!X! << !Shift[%%i]!) | (!X! >> (32-!Shift[%%i]!)))"
set /a digA'=!tempD!
)
:: Add this chunk's result to the running digest value
set /a digA0+=!digA'!, digB0+=!digB'!, digC0+=!digC'!, digD0+=!digD'!
exit /b
Code: Select all
c:\Users\Marc\Desktop\md5>md5test
*** Test 1: MD5('emptystring')
D41D8CD98F00B204E9800998ECF8427E - result
d41d8cd98f00b204e9800998ecf8427e - expected
Time: 00:00:00.17
*** Test 2: MD5(a)
0CC175B9C0F1B6A831C399E269772661 - result
0cc175b9c0f1b6a831c399e269772661 - expected
Time: 00:00:00.22
*** Test 3: MD5(abc)
900150983CD24FB0D6963F7D28E17F72 - result
900150983cd24fb0d6963f7d28e17f72 - expected
Time: 00:00:00.20
*** Test 4: MD5(message digest)
F96B697D7CB7938D525A2F31AAF161D0 - result
f96b697d7cb7938d525a2f31aaf161d0 - expected
Time: 00:00:00.22
*** Test 5: MD5(abcdefghijklmnopqrstuvwxyz)
C3FCD3D76192E4007DFB496CCA67E13B - result
c3fcd3d76192e4007dfb496cca67e13b - expected
Time: 00:00:00.20
*** Test 6: MD5(ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789)
D174AB98D277D9F5A5611C2C9F419D9F - result
d174ab98d277d9f5a5611c2c9f419d9f - expected
Time: 00:00:00.29
*** Test 7: MD5(12345678901234567890123456789012345678901234567890123456789012345678901234567890)
57EDF4A22BE3C955AC49DA2E2107B67A - result
57edf4a22be3c955ac49da2e2107b67a - expected
Time: 00:00:00.31
c:\Users\Marc\Desktop\md5>
A user would enter their password, let's say it's "password" and you'd run it through md5 (preferably after salting, but that's a different topic) to obtain a hash like "5F4DCC3B5AA765D61D8327DEB882CF99". That hash can be stored in the clear, in a text file, etc. and there is no conceivable way to reverse it to obtain the original "password". Then when a user tries to log in to your system, you hash the password they provide, compare that hash to the original value you saved, and if they match the password must have been correct. It is nearly impossible* for someone to come up with another string, such as "DONUT" that will produce the same hash as the string "password". (* - again, see the notes below)
Another common use is for file validation, where before running itself a batch could hash either its own code, or the content of a configuration file, etc. and compare the resulting hash to a "known good" value. If the hashes don't match, the file in question has clearly been tampered with. I'm sure you've all seen the md5 checksums next to files you're downloading online which are intended to verify no corruption occurred during the download.
NOTES: You might have heard that MD5 is cryptographically "broken", which is absolutely true, but this simply means there are attacks against the algorithm that can provide a hash collision (ie: choosing "DONUT" to provide the same hash as "password") in less than brute-force time. Nowadays it's actually broken pretty thoroughly and shouldn't be used for real security-critical situations. However, that doesn't mean it's completely useless, and it's certainly plenty of security for something as trivial as a batch file. Certainly better than the simple substitution codes first suggested in the string encoding thread
Anyway if anyone would like to take a look through the code and suggest any efficiency improvements they'd be most welcome. Unfortunately the algorithm is strictly serial so cannot be sped up by multithreading like my AES implementation. Currently it's able to process a 1KB file in about 1.3 seconds, and a 100KB file in about 3 minutes (not sure why it scales so poorly...). You really don't want to use this on large files; for anything more than just "batch tinkering because it's fun" you should get a Jscript or C implementation. But where's the fun in that?!
Thanks all!