Multi-process Advanced Encryption Standard (AES)

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
Magialisk
Posts: 104
Joined: 25 Jul 2013 19:00

Re: Multi-process Advanced Encryption Standard (AES)

#16 Post by Magialisk » 08 Nov 2013 02:38

Einstein,
I believe I'm mostly done digesting your massive post. Please if I've overlooked something let me know.

I can see that in the first parts you're showing that a FOR loop executes much faster than a GOTO loop. No offense meant, but I believe that should be considered common knowledge, or at least best practice. Unfortunately, a FOR loop has it's limits, and not all GOTO loops can be rewritten as FOR loops. Either way thank you for taking the time to measure the performance deltas. It's very interesting information.

Moving on to the middle parts, I see you measured the performance of "set A=B & set Y=Z" vs. "set A=B, Y=Z". Additionally you tested minimizing the total number of 'set' commands by placing more comma-separated items on a single line. This provided a pretty striking difference in timing, so even though these sets only have to run once in my control program, I'm going to use your improvement for sure. Thank you very much for pointing this out! If I'm lucky and under the 8k character limit I might even be able to get a whole table on one line using %\n% continuation...

Finally at the end I see you're showing that running the 'set' operations in the child thread is not wise. I came to that same realization a couple days ago and moved as much code as possible out of the child and into the parent. That was one of the first and largest optimizations that I made to the code after I posted it, however I haven't yet posted the modified version because I've been so busy implementing other optimizations.

Thank you again for taking so much time to point out these optimizations. I'm curious about some of the type, findstr, map and lookup improvements you mentioned. I wouldn't ask you to write a bunch of code, but maybe a couple line example of what you're referring to? I can do the dirty work from there. :D

Magialisk
Posts: 104
Joined: 25 Jul 2013 19:00

Re: Multi-process Advanced Encryption Standard (AES)

#17 Post by Magialisk » 08 Nov 2013 03:20

Just wanted to let everyone know that I updated the first several posts in this thread with the new, much more efficient code. I haven't done much to optimize the main control program except implement a couple of the suggestions out of this thread. Most of my efforts have been on the workhorse 'aescore', rewriting algorithms and other large-scale changes to optimize performance. I believe it's very tight and efficient now, but there's always something left to squeeze out so I'd welcome any new feedback. Thanks to everyone who's already contributed!

einstein1969
Expert
Posts: 960
Joined: 15 Jun 2012 13:16
Location: Italy, Rome

Re: Multi-process Advanced Encryption Standard (AES)

#18 Post by einstein1969 » 08 Nov 2013 07:13

Magialisk wrote:Einstein,
I believe I'm mostly done digesting your massive post. Please if I've overlooked something let me know.

I can see that in the first parts you're showing that a FOR loop executes much faster than a GOTO loop. No offense meant, but I believe that should be considered common knowledge, or at least best practice. Unfortunately, a FOR loop has it's limits, and not all GOTO loops can be rewritten as FOR loops. Either way thank you for taking the time to measure the performance deltas. It's very interesting information.

Moving on to the middle parts, I see you measured the performance of "set A=B & set Y=Z" vs. "set A=B, Y=Z". Additionally you tested minimizing the total number of 'set' commands by placing more comma-separated items on a single line. This provided a pretty striking difference in timing, so even though these sets only have to run once in my control program, I'm going to use your improvement for sure. Thank you very much for pointing this out! If I'm lucky and under the 8k character limit I might even be able to get a whole table on one line using %\n% continuation...

Finally at the end I see you're showing that running the 'set' operations in the child thread is not wise. I came to that same realization a couple days ago and moved as much code as possible out of the child and into the parent. That was one of the first and largest optimizations that I made to the code after I posted it, however I haven't yet posted the modified version because I've been so busy implementing other optimizations.

Thank you again for taking so much time to point out these optimizations. I'm curious about some of the type, findstr, map and lookup improvements you mentioned. I wouldn't ask you to write a bunch of code, but maybe a couple line example of what you're referring to? I can do the dirty work from there. :D


I saw the new code. Well done Magialisk!

In my tests not I proved the efficiency of the "FOR", but the efficiencies of the "Block" (). The "FOR" was used only to calculate the best timing.

This code:

Code: Select all

(
  set /a x=4
  set /a r=5
)


generally, run faster than this:

Code: Select all

  set /a x=4
  set /a r=5


There are cases in which instead goes more slowly. You have to do some testing!

I'm working on new code!

Einstein1969
Last edited by einstein1969 on 09 Nov 2013 10:27, edited 2 times in total.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Multi-process Advanced Encryption Standard (AES)

#19 Post by dbenham » 08 Nov 2013 07:45

einstein1969 wrote:This code:

Code: Select all

(
  set /a x=4
  set /a r=5
)

generally, run faster than this:

Code: Select all

  set /a x=4
  set /a r=5

There are cases in which instead goes more slowly. You have to do some testing!


I haven't tested, but I should think the following would be fastest:

Code: Select all

set /a "x=4, r=5"


Dave Benham

jeb
Expert
Posts: 1055
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

Re: Multi-process Advanced Encryption Standard (AES)

#20 Post by jeb » 08 Nov 2013 08:20

dbenham wrote:I haven't tested, but I should think the following would be fastest:
Code:
set /a "x=4, r=5"


Yes, and it's much faster :!:
My test with

Code: Select all

 set /a key[0][%%a]=!key[0][%%b]!^^^^!key[0][%%a]!
         set /a key[1][%%a]=!key[1][%%b]!^^^^!key[1][%%a]!
         set /a key[2][%%a]=!key[2][%%b]!^^^^!key[2][%%a]!
         set /a key[3][%%a]=!key[3][%%b]!^^^^!key[3][%%a]!


versus

Code: Select all

 set /a key[0][%%a]=!key[0][%%b]!^^^^!key[0][%%a]!, key[1][%%a]=!key[1][%%b]!^^^^!key[1][%%a]!, key[2][%%a]=!key[2][%%b]!^^^^!key[2][%%a]!, key[3][%%a]=!key[3][%%b]!^^^^!key[3][%%a]!

I got ~60% faster execution.

Magialisk
Posts: 104
Joined: 25 Jul 2013 19:00

Re: Multi-process Advanced Encryption Standard (AES)

#21 Post by Magialisk » 08 Nov 2013 10:22

You guys are amazing! This is why I love this forum. Of all the places to get a performance improvement I never would have looked at the 'set' commands. This code as a whole is almost entirely sets (thousands and thousands of them) with only a little bit of looping and branching in between in between. I chalked the sets up as a lost cause that the parser just needs to chug through, and other than eliminating as many as possible (which as you can see made large improvements in my new algorithms) I wasn't trying to "optimize" the ones that remained.

I will go back into the code this weekend and replace all instances of 'set A & set B' or 'set A <CRLF> set B' with 'set A, B'. I almost feel upset that I don't have to work today, since I can't test the improved code on my work PC to post standardized timing results. That will have to wait until Monday :D

einstein1969
Expert
Posts: 960
Joined: 15 Jun 2012 13:16
Location: Italy, Rome

Re: Multi-process Advanced Encryption Standard (AES)

#22 Post by einstein1969 » 09 Nov 2013 08:50

Hi Magialisk,

There is good news!

If you load only the variables needed by the execution time of the most expensive part falls by a further 50% (about).

You can avoid loading the variables G11 G13 G14 G9 S' if you are in the process of Encrypt and viceversa.

For more information: Why does SET performance degrade as environment size grows?

The result of test

overload env:

Code: Select all

PID 2 Execution time of part[4-5]: 58 hundredths of a second


Only necessary env:

Code: Select all

PID 2 Execution time of part[4-5]: 28 hundredths of a second


I'm working on next speedup! (not sparse monodimensional fixed size arrays of integer)

EDIT:
original:

Code: Select all

>aes test_aes.txt aes.key

Converting input file to ASCII-coded hexadecimal...
...Hexadecimal Conversion Complete (Elapsed Time: 00:00:02.92)
Beginning AES Encryption with 2 PIDs...
...AES Encryption Complete (Elapsed Time: 00:01:50.31)

>aes test_aes.txt.aes aes.key

Beginning AES Decryption with 2 PIDs...
...AES Decryption Complete (Elapsed Time: 00:01:56.44)
Restoring binary file from ASCII-coded hexadecimal...
...Binary file restoration complete (Elapsed Time: 00:00:00.64)


with this tip:

Code: Select all

del test_aes(1).txt
call aes test_aes.txt aes.key
echo(
call aes test_aes.txt.aes aes.key
echo(
fc /B test_aes.txt test_aes(1).txt

Converting input file to ASCII-coded hexadecimal...
...Hexadecimal Conversion Complete (Elapsed Time: 00:00:01.24)
Beginning AES Encryption with 2 PIDs...
...AES Encryption Complete (Elapsed Time: 00:01:04.16)

Beginning AES Decryption with 2 PIDs...
...AES Decryption Complete (Elapsed Time: 00:01:34.31)
Restoring binary file from ASCII-coded hexadecimal...
...Binary file restoration complete (Elapsed Time: 00:00:00.22)

Confronto in corso dei file test_aes.txt e TEST_AES(1).TXT
FC: nessuna differenza riscontrata


Einstein1969
Last edited by einstein1969 on 10 Nov 2013 10:43, edited 7 times in total.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Multi-process Advanced Encryption Standard (AES)

#23 Post by dbenham » 09 Nov 2013 09:04

einstein1969 wrote:In my tests not I proved the efficiency of the "FOR", but the efficiencies of the "Block" (). The "FOR" was used only to calculate the best timing.

This code:

Code: Select all

(
  set /a x=4
  set /a r=5
)

generally, run faster than this:

Code: Select all

  set /a x=4
  set /a r=5

There are cases in which instead goes more slowly. You have to do some testing!

:shock: Amazing, I never guessed the above would be true :!:

I assumed there would not be a substantive difference between the two since the same amount of code must be parsed, and each is parsed exactly once.

I had a hard time following your test code from a few posts back, so I wrote my own simplified test suite:

Code: Select all

@echo off
setlocal
set "open= "
set "close= "
:loop
set "beg=%time%"

for /l %%N in (1 1 1000) do call :test

set "end=%time%"
set /a t1=t2=0
for /f "tokens=1-4 delims=:." %%a in ("%beg: =0%") do set /a t1=(((1%%a*60)+1%%b)*60+1%%c)*100+1%%d-36610100
for /f "tokens=1-4 delims=:." %%a in ("%end: =0%") do set /a t2=(((1%%a*60)+1%%b)*60+1%%c)*100+1%%d-36610100
set /a "diff=t2-t1"
if %diff% lss 0 set /a diff+=24*60*60*100

echo %open%%diff%%close%

if "%open%" equ "(" exit /b
set "open=("
set "close=)"
goto loop

:test
%open%
set /a x=0
set /a x=1
set /a x=2
set /a x=3
set /a x=4
set /a x=5
set /a x=6
set /a x=7
set /a x=8
set /a x=9
%close%
exit /b

Here are the results of two runs on my Win 7 64 bit machine:

Code: Select all

C:\test>test
 507
(247)

C:\test>test
 505
(246)

And here are results of two runs on a Virtual XP 32 bit machine:

Code: Select all

Z:\test>test
 3763
(2500)

Z:\test>test
 3685
(2524)

I am really surprised adding parentheses provides such dramatic speed improvement. It makes me question my assumptions as to the extent macros provide a speed boost. I still believe in macros, but some of the performance boost I attributed to macros could simply have been the result of using parentheses.


Dave Benham

einstein1969
Expert
Posts: 960
Joined: 15 Jun 2012 13:16
Location: Italy, Rome

Re: Multi-process Advanced Encryption Standard (AES)

#24 Post by einstein1969 » 10 Nov 2013 08:36

Magialisk wrote:As I said before I'd really love to hear what you guys think. I know there's a lot up there to digest, but if anyone feels like picking it apart constructive feedback is welcome. Especially anything that makes it run faster or gets rid of that pesky System warning when decrypting would be very much appreciated. :D



The problem is in this part of code:

Code: Select all

@echo off

(
   :: Walk the hexFile, copying everything but the last line to a temp file.
   :: The last line needs to have its padding characters removed.
)


Result: (in my language)

Code: Select all

Impossibile trovare l'unità specificata.


But i don't know why :? . An expert is required!

The solution is use rem, use single :: or insert a rem between :: or use a caret (^) at the end of first :: (in the original code) or ...

Einstein1969

jeb
Expert
Posts: 1055
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

Re: Multi-process Advanced Encryption Standard (AES)

#25 Post by jeb » 10 Nov 2013 11:19

einstein1969 wrote:But i don't know why :? . An expert is required!


Ok, that's simple.

There exists a two lines rule inside of brackets when using labels.
SO:Unexpected “The system cannot find the drive specified.” in batch file

Aacini
Expert
Posts: 1913
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Multi-process Advanced Encryption Standard (AES)

#26 Post by Aacini » 10 Nov 2013 23:59

I reviewed the AES FIPS-197 document and I like it! The data manipulation involved is convoluted and complex, so it is difficult to achieve it in an efficient way, even in assembler language. In the case of Batch files, we can use some text substitution tricks in order to compress more operations into FOR commands, so they run faster. However, most of the modifications I propose can not be easily included in the existent code unless it be extensively modified, so I decided to write a complete working program that include all modifications, at least in a preliminary version. In the case of complex data manipulations it is easier to review working code instead of read descriptions about it. I only developed the encrypt part in the program; I will complete the decrypt part later.

Some of my modifications can be directly included in the existing code, like the generation of factor tables, so the program will drastically reduce its size and run a little bit faster. I tried to completely eliminate these constants and insert the equivalent operation in the SET /A commands, but I can't find such equivalent operation. The AES documentation specify that is "modulo 0x11B", but this doesn't works. @Magialisk: how did you calculated the G2, G3, etc. constants? Why you don't use the original operation instead of the precomputed constants?

Nov-18-2013: I modified the code below in these points:
- Generate Galois constants via a formula instead of converting pre-defined values (strings).
- Achieve the BinToHex conversion via Batch code (instead of JScript).
- Added a multi-process encryption method.

Code: Select all

@echo off
setlocal DisableDelayedExpansion
set "bang=!"
setlocal EnableDelayedExpansion

rem Multi-process dispatcher:
set "param=%~1"
if "!param:~0,1!" equ ":" (
   shift
   goto !param:~1!
)

rem Implementation of AES FIPS-197 cryptographic algorithm, multi-process encryption version
rem http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf
rem Antonio Perez Ayala

if "%~2" neq "" goto begin
echo AES.BAT file.ext hexKey [pidNum]
echo/
echo   file.ext  Name of file to encrypt/decrypt:
echo             - Not .aes extension: encrypt any file, produce file.ext.aes
echo             - .aes extension: decrypt an .aes file, produce original file.ext
echo   hexKey    String of 32, 48 or 64 hexadecimal digits, the same used in encrypt
echo             must be used for decrypt.
echo   pidNum    Number of parallel encryption processes, defaults to CPU cores - 1.
goto :EOF

:begin

rem To trace intermediate results, delete the next line
set TRACE=rem

if not exist "%~1" echo File not found & goto :EOF
set "fileName=%~N1"
set fileExt=%~X1
set fileSize=%~Z1

set "key=%~2"
for %%a in (64 48 32) do (
   if "!key:~%%a,1!" equ "" (
      set /A b=%%a-1
      for %%b in (!b!) do if "!key:~%%b,1!" neq "" set keyLen=%%a& goto setParams
   )
)
echo Bad key lenght & goto :EOF

:setParams
set /A Nb=4, blockLen=8*Nb-2, Nk=keyLen/8, keyLen-=2, Nr=Nk+6, N=NUMBER_OF_PROCESSORS-1
if "%~3" neq "" set N=%~3
if %N% lss 1 set N=1

rem Define a decimal-to-hexadecimal conversion table for one byte
set i=0
for %%a in (0 1 2 3 4 5 6 7 8 9 a b c d e f) do (
   for %%b in (0 1 2 3 4 5 6 7 8 9 a b c d e f) do (
      set "H[!i!]=%%a%%b"
      set /A i+=1
   )
)

rem Byte variables are individual. 32-bits variables have 4 separated byte parts named: var0, var1, var2, var3

rem Define pre-computed round constants for key scheduler algorithm
set i=1
for %%a in (01 02 04 08 10 20 40 80 1b 36) do set /A "Rcon[!i!]=0x%%a, i+=1"

rem Define pre-computed Rijndael S-box values for SubBytes() function and key scheduler algorithm
set i=0
for %%s in ("637c777bf26b6fc53001672bfed7ab76"
            "ca82c97dfa5947f0add4a2af9ca472c0"
            "b7fd9326363ff7cc34a5e5f171d83115"
            "04c723c31896059a071280e2eb27b275"
            "09832c1a1b6e5aa0523bd6b329e32f84"
            "53d100ed20fcb15b6acbbe394a4c58cf"
            "d0efaafb434d338545f9027f503c9fa8"
            "51a3408f929d38f5bcb6da2110fff3d2"
            "cd0c13ec5f974417c4a77e3d645d1973"
            "60814fdc222a908846eeb814de5e0bdb"
            "e0323a0a4906245cc2d3ac629195e479"
            "e7c8376d8dd54ea96c56f4ea657aae08"
            "ba78252e1ca6b4c6e8dd741f4bbd8b8a"
            "703eb5664803f60e613557b986c11d9e"
            "e1f8981169d98e949b1e87e9ce5528df"
            "8ca1890dbfe6426841992d0fb054bb16"
           ) do (
   set "s=%%~s"
   for /L %%j in (0,2,%blockLen%) do set /A "s[!i!]=0x!s:~%%j,2!, i+=1"
)


rem AES: KeyExpansion(byte key[4*Nk], word w[Nb*(Nr+1)], Nk)
:KeyExpansion
rem AES: begin

rem Split the key string in bytes
set i=0
for /L %%j in (0,2,%keyLen%) do set /A "key[!i!]=0x!key:~%%j,2!, i+=1"

rem AES:    word temp
rem AES:    i = 0
rem AES:    while (i < Nk)
rem AES:       w[i] = word(key[4*i], key[4*i+1], key[4*i+2], key[4*i+3])
rem AES:       i = i+1
rem AES:    end while
set /A NkM1=Nk-1, j=0
for /L %%i in (0,1,%NkM1%) do (
   for %%b in (0 1 2 3) do for %%j in (!j!) do set /A "w%%b[%%i]=!key[%%j]!, j+=1"
)

if %Nk% gtr 6 (
   rem Define the "else if" macro for this case that is used below
   set "else if Nk gtr 6 and iModNk equ 4 set temp to SubWord[temp]=) else if !bang!iModNk!bang! equ 4 ( for %%b in (0 1 2 3) do set

/A temp%%b=s[!bang!temp%%b!bang!]"
)

rem AES:    i = Nk
rem AES:    while (i < Nb * (Nr+1)]
rem AES:       temp = w[i-1]
rem AES:       if (i mod Nk = 0)
rem AES:          temp = SubWord(RotWord(temp)) xor Rcon[i/Nk]
rem AES:       else if (Nk > 6 and i mod Nk = 4)
rem AES:          temp = SubWord(temp)
rem AES:       end if
rem AES:       w[i] = w[i-Nk] xor temp
rem AES:       i = i + 1
rem AES:    end while

set /A wLen=Nb*(Nr+1)-1
for /L %%i in (%Nk%,1,%wLen%) do (
   set /A "iM1=%%i-1, iModNk=%%i%%Nk, iDivNk=%%i/Nk, iMNk=%%i-Nk"
   for /F "tokens=1-3" %%j in ("!iM1! !iDivNk! !iMNk!") do (
      rem j=i-1, k=i/Nk, l=i-Nk
      rem temp = w[i-1]:
      for %%b in (0 1 2 3) do set "temp%%b=!w%%b[%%j]!"
      if !iModNk! equ 0 (
         rem RotWord(temp^):
         for /F "tokens=1,2,3,4" %%a in ("!temp1! !temp2! !temp3! !temp0!") do (
            rem temp = SubWord(...^) xor Rcon[i/Nk]:
            set /A "temp0=s[%%a] ^ Rcon[%%k], temp1=s[%%b], temp2=s[%%c], temp3=s[%%d]"
         )
      %else if Nk gtr 6 and iModNk equ 4 set temp to SubWord[temp]%
      )
      rem w[i] = w[i-Nk] xor temp:
      for %%b in (0 1 2 3) do set /A "w%%b[%%i]=w%%b[%%l] ^ temp%%b"
   )
)

rem AES: end KeyExpansion

rem This section show the key expansion
%TRACE% for /L %%i in (0,1,%wLen%) do (
%TRACE%    set "line=%%i- "
%TRACE%    for %%b in (0 1 2 3) do for %%h in (!w%%b[%%i]!) do set "line=!line!!H[%%h]!"
%TRACE%    echo !line!>&2
%TRACE% )
%TRACE% echo/>&2


if /I "%fileExt%" equ ".aes" goto Decryption

:Encryption

rem Define multiplication tables for multiplying by 2 and 3 over the
rem Rijndael Galois finite field.  Required for MixColumns() function.

for /L %%x in (0,1,255) do set /A "x=%%x, G3%%x=x, x<<=1, x^=(x>>8)*0x11B, G2%%x=x,G3%%x^=x"

rem Convert input file to hexadecimal ASCII digits, AND
rem encrypt hexadecimal digits to AES format via multi-process method
echo %time% - Start conversion AND encryption of file
call :BinToHex "%fileName%%fileExt%"
echo %time% - End conversion AND encryption
goto :EOF


:BinToHex binFile
set "tempFile=%temp%\BinToHex.tmp"
del "%tempFile%" 2>NUL
fsutil file createnew "%tempFile%" %filesize% >NUL
del pipeFile?.txt 2>NUL
del "%fileName%%fileExt%.aes" 2>NUL
set /A a=0, record=-1
set "output="
for /F "skip=1 tokens=1,2 delims=: " %%b in ('fc /B "%~1" "%tempFile%"') do (
   set /A b=0x%%b
   if !a! neq !b! (
      set /A c=b-a
      for /L %%i in (1,1,!c!) do (
         set "output=!output!00"
         if "!output:~31,2!" neq "" (
            set /A record+=1, nextTurn=record %% N
            echo !output!>> pipeFile!nextTurn!.txt
            set "output="
            rem Multi-process starter:
            if !record! lss %N% start "" /B "%~F0" :Encrypt !nextTurn!
         )
      )
   )
   set /A a=b+1
   set "output=!output!%%c"
   if "!output:~31,2!" neq "" (
      set /A record+=1, nextTurn=record %% N
      echo !output!>> pipeFile!nextTurn!.txt
      set "output="
      rem Multi-process starter:
      if !record! lss %N% start "" /B "%~F0" :Encrypt !nextTurn!
   )
)
if !a! neq %filesize% (
   set /A c=filesize-a
   for /L %%i in (1,1,!c!) do (
      set "output=!output!00"
      if "!output:~31,2!" neq "" (
         set /A record+=1, nextTurn=record %% N
         echo !output!>> pipeFile!nextTurn!.txt
         set "output="
      )
   )
)
if "!output!" neq "" (
   set pad=0
   for /L %%i in (2,2,30) do (
      if "!output:~%%i,2!" equ "" (
         set "output=!output!00"
         set /A pad+=2
      )
   )
   set /A record+=1, nextTurn=record %% N
   echo !output!>> pipeFile!nextTurn!.txt
   REM echo !output! !pad!
)
del "%tempFile%"
:waitForEncryption
if exist pipeFile?.txt goto waitForEncryption
exit /b


:Encrypt turn
set /A NbM1=Nb-1, NkM1=Nk-1, NrM1=Nr-1
call :Cipher %1 < pipeFile%1.txt
del pipeFile%1.txt
exit

rem AES: Cipher(byte in[4*Nb], byte out[4*Nb], word w[Nb*(Nr+1)])
:Cipher turn
rem AES: begin
rem AES:    byte state[4,Nb]
rem AES:    state = in

   set "in="
   set /P in=
   if not defined in goto endCipher
   set p=0
   for /L %%c in (0,1,%NbM1%) do (
      for /L %%r in (0,1,%NkM1%) do (
         for %%p in (!p!) do set /A "state[%%r][%%c]=0x!in:~%%p,2!, p+=2"
      )
   )
   %TRACE% call :ShowMatrix "round[0].input " state >&2

rem AES:    AddRoundKey(state, w[0, Nb-1]^)
   for /L %%c in (0,1,%NbM1%) do (
      for %%b in (0 1 2 3) do (
         set /A "state[%%b][%%c]^=w%%b[%%c]"
         %TRACE% set sch[%%b][%%c]=!w%%b[%%c]!
      )
   )
   %TRACE% call :ShowMatrix "round[0].k_sch " sch >&2

rem AES:    for round = 1 step 1 to Nr–1
rem AES:       ShiftRows(state^)
rem AES:       SubBytes(state^)
rem AES:       MixColumns(state^)
rem AES:       AddRoundKey(state, w[round*Nb, (round+1^)*Nb-1]^)
rem AES:    end for
   for /L %%i in (1,1,%NrM1%) do (
      %TRACE% echo/>&2
      %TRACE% call :ShowMatrix "round[%%i].start " state >&2

      rem ShiftRows(state^):
      set a=0
      for %%s in ("0 1 2 3" "1 2 3 0" "2 3 0 1" "3 0 1 2") do (
         for /F "tokens=1-5" %%a in ("!a! %%~s") do (
            for /F "tokens=1-4" %%B in ("!state[%%a][%%b]! !state[%%a][%%c]! !state[%%a][%%d]! !state[%%a][%%e]!") do (
               rem SubBytes(state^):
               set /A "state[%%a][0]=s[%%B], state[%%a][1]=s[%%C], state[%%a][2]=s[%%D], state[%%a][3]=s[%%E]"
            )
         )
         set /A a+=1
      )
      %TRACE% call :ShowMatrix "round[%%i].s_row " state >&2

      rem MixColumns(state^):
      set /A l=%%i*Nb
      for /L %%c in (0,1,%NbM1%) do (
         for /F "tokens=1-4" %%F in ("!state[0][%%c]! !state[1][%%c]! !state[2][%%c]! !state[3][%%c]!") do (
            for %%m in ("0=G2 G3 + +" "1=+ G2 G3 +" "2=+ + G2 G3" "3=G3 + + G2") do (
               for /F "tokens=1-5 delims== " %%A in (%%m) do (
                  rem AddRoundKey(state, w[round*Nb, (round+1^)*Nb-1]^):
                  set /A "state[%%A][%%c]=(%%B%%F) ^^ (%%C%%G) ^^ (%%D%%H) ^^ (%%E%%I) ^^ (w%%A[!l!])"
               )
            )
         )
         set /A l+=1
      )
      %TRACE% call :ShowMatrix "round[%%i].m_col " state >&2

      REM This section show SCH matrix
      %TRACE% set /A l=%%i*Nb
      %TRACE% for /L %%c in (0,1,%NbM1%) do (
      %TRACE%    for %%b in (0 1 2 3) do (
      %TRACE%       for %%l IN (!l!) do set sch[%%b][%%c]=!w%%b[%%l]!
      %TRACE%    )
      %TRACE%    set /A l+=1
      %TRACE% )
      %TRACE% call :ShowMatrix "round[%%i].k_sch " sch >&2

   )

rem AES:    ShiftRows(state^)
rem AES:    SubBytes(state^)
   rem ShiftRows(state^):
   set a=0
   for %%s in ("0 1 2 3" "1 2 3 0" "2 3 0 1" "3 0 1 2") do (
      for /F "tokens=1-5" %%a in ("!a! %%~s") do (
         for /F "tokens=1-4" %%B in ("!state[%%a][%%b]! !state[%%a][%%c]! !state[%%a][%%d]! !state[%%a][%%e]!") do (
            rem SubBytes(state^):
            set /A "state[%%a][0]=s[%%B], state[%%a][1]=s[%%C], state[%%a][2]=s[%%D], state[%%a][3]=s[%%E]"
         )
      )
      set /A a+=1
   )

rem AES:    AddRoundKey(state, w[Nr*Nb, (Nr+1)*Nb-1])
   set /A l=Nr*Nb
   for /L %%c in (0,1,%NbM1%) do (
      for %%b in (0 1 2 3) do (
         set /A "state[%%b][%%c]^^=w%%b[!l!]"
         %TRACE% for %%l in (!l!) do set sch[%%b][%%c]=!w%%b[%%l]!
      )
      set /A l+=1
   )
   %TRACE% echo/>&2
   %TRACE% call :ShowMatrix "round[-].k_sch " sch >&2
   %TRACE% call :ShowMatrix "round[-].outpt " state >&2
   %TRACE% echo =============================================== >&2

rem AES:    out = state
   call :ShowMatrix "" state >> "%fileName%%fileExt%.aes"
goto Cipher

rem AES: end Cipher
:endCipher
exit /B


:ShowMatrix Header: Matrix
set "line=%~1"
for /L %%j in (0,1,3) do (
   for /L %%i in (0,1,3) do (
      for %%h in (!%2[%%i][%%j]!) do (
         set "line=!line!!H[%%h]!"
      )
   )
)
echo !line!
exit /B


:Decryption

REM Under development

rem Define pre-computed multiplication tables for multiplying by 9, 0xb (11), 0xd (13) and 0xe (14) over the
rem Rijndael Galois finite field.  Required for InvMixColumns() function.

rem        Factor:                  9          b          d          e
for /L %%x in (0,1,255) do (
   rem     bit 0 [1]:               1          1          1          0
   set /A "x=%%x,                  G9[%%x]=   Gb[%%x]=   Gd[%%x]=x"
   rem     bit 1 [2]:               0          1          0          1
   set /A "x<<=1, x^=(x>>8)*0x11b,            Gb[%%x]^=x,           Ge[%%x]=x"
   rem     bit 2 [4]:               0          0          1          1
   set /A "x<<=1, x^=(x>>8)*0x11b,                       Gd[%%x]^=x,Ge[%%x]^=x"
   rem     bit 3 [8]:               1          1          1          1
   set /A "x<<=1, x^=(x>>8)*0x11b, G9[%%x]^=x,Gb[%%x]^=x,Gd[%%x]^=x,Ge[%%x]^=x"
)

goto :EOF


There are other parts of the original code that just can not be optimized, like the multi-process control method. If we analyze the purpose of the multi-process feature we would conclude that this technique allows to encrypt/decrypt a file in a fraction of the time that it would take with just one process. Of course, this goal is achieved dividing the file by the number of processes, that is to say, if there are 3 parallel tasks, each one would process 1/3 of the file. Although in real life this fraction is not exact, the idea is correct. We may develop an equivalent but much simpler method that consist in split the input file in N parts of the same size and then start N parallel tasks, so each one process one part. When all tasks ends, the generated parts are just joined. This method does not waste CPU resources in a complex controller and allows to use all available CPU cores in the encrypt/decript process.

Although the objective of this thread is to develop this method in a pure Batch program, the final hexadecimal to binary conversion can not be achieved with Batch commands, so if VBS (or JScript) code is already included in the program in order to achieve this conversion, why not add more code to the non-Batch section? I will also develop a JScript version of the encrypt/decrypt routines just to complete this topic; it is expected that this version be the fastest one.

Antonio
Last edited by Aacini on 19 Nov 2013 00:03, edited 2 times in total.

Magialisk
Posts: 104
Joined: 25 Jul 2013 19:00

Re: Multi-process Advanced Encryption Standard (AES)

#27 Post by Magialisk » 11 Nov 2013 10:09

Antonio you've touched on a lot of key points in your post above, many of them I've wrestled with myself in the construction of this batch.

First your questions about making the code shorter, removing the constants, using algorithms to create them, etc. I actually addressed that in the original string encryption thread here: viewtopic.php?f=3&t=4579&start=22. You'll see in that linked post that I was already wrestling with myself, and at that point decided to write an algorithm as opposed to typing in over 2000 constants. However, over the next few posts in that thread you'll see how tragically slow it was to generate the constants in real time, so I modified my algorithm to write a text file in the format "set G[1]=xxx & set...". I took this output text file and pasted it into the code directly, hence the constant tables you are familiar with. Since you were asking how I generated the constants though, you'll find the algorithm I used in that thread so you can re-write it for your own use.

What it really boils down to is a time-memory tradeoff. The lookup table approach turns a very complicated calculation into a quick set command, so it runs much faster, however it takes up a lot of memory storing the ~2K lookup constants in the environment. The algorithm to generate the constants is quite short in terms of memory, but takes a while to run as it requires multiple loop iterations, etc. That's the tradeoff. Of note, the algorithm can be sped up greatly if someone can help me "break" the FOR loop early. Instead of always running 8 iterations, it can break when either A or B input reaches 0, but nothing I tried was able to break the loop properly. I asked for ideas in that original thread, but traffic in there was low, so maybe someone here will have an answer.

Now on to your second question, why not use more Jscript/VBscript since the batch is already "polluted" with a scripted subroutine. I again have gone back and forth on this. Doing the Bin2Hex subroutine in another language would remove the requirement for this script to run as administrator for example (by removing fsutil). To me that's a strong advantage to consider. However, the object of this demo/project/whatever was to do it in batch. Otherwise why not just sit down, write some C++ and '#include <crypto.h>'. AES has been done, and done efficiently, in just about every high level language, so reinventing the wheel there didn't even seem to add much educational value. In batch on the other hand I learned quite a bit just coding this myself, and have learned even more with you experts picking it apart. That and it rounds out my crypto batch library quite nicely, coming after my earlier implementations of Vigenere's Cipher, the Enigma machine, and other cryptographic algorithms. You can read about those efforts in the string encoding thread I linked above.

So long story short, I made a personal decision to not include any other language code except for the one routine which absolutely cannot be done in batch (there's no way to represent the 0x00 NULL character). If that script could be replaced with batch code I'd take the speed penalty and replace it just for the sake of "purity" but that's me :D If you re-implement the algorithm in JScript, however, I do agree with you it should run orders of magnitude faster. Please post the result back here if you do go that route.

Magialisk
Posts: 104
Joined: 25 Jul 2013 19:00

Re: Multi-process Advanced Encryption Standard (AES)

#28 Post by Magialisk » 11 Nov 2013 10:13

jeb wrote:
einstein1969 wrote:But i don't know why :? . An expert is required!


Ok, that's simple.

There exists a two lines rule inside of brackets when using labels.
SO:Unexpected “The system cannot find the drive specified.” in batch file


Einstein and Jeb thank you very much for tracking down this problem. You'll see that I'm no stranger to this, and everywhere else in the code where my comment labels run 2+lines I used the '%\n%' continuation to avoid the problem. In this place apparently I forgot and just never noticed when trying to troubleshoot.

It's maddening when you're trying to debug something like this and your eyes only see what they want to instead of what's really on the page :evil: Thanks again, that nuisance error was really bugging me :D

Magialisk
Posts: 104
Joined: 25 Jul 2013 19:00

Re: Multi-process Advanced Encryption Standard (AES)

#29 Post by Magialisk » 11 Nov 2013 11:48

Antonio, I ran some tests with your code and it sure points out how inefficient mine is, particularly running as only 1 process. Unfortunately I don't have my latest code with me at work, as I was tinkering this weekend and forgot to grab it out the way out this morning, but what I have here is good enough for demonstration. I did have a couple of problems with your code as well, so I'll point those out as I go along.

First I created a .txt file and put a 32-char hex data in there. I tested our two programs using the same key:

Code: Select all

c:\BTP\AESmult>aa-aes testdata.txt 000102030405060708090a0b0c0d0e0f
 8:42:26.22 - Start conversion of file to hexadecimal digits
 8:42:26.29 - End conversion

 8:42:26.29 - Start encryption of hexadecimal file
Missing operand.
Missing operand.
Missing operand.
   [many more 'missing operand' messages removed for brevity]
Missing operand.
Missing operand.
Missing operand.
Missing operand.
 8:42:27.00 - End encryption

c:\BTP\AESmult>aes testdata.txt test.key 1
Converting input file to ASCII-coded hexadecimal...
...Hexadecimal Conversion Complete (Elapsed Time: 00:00:00.27)
Beginning AES Encryption with 1 PIDs...
...AES Encryption Complete (Elapsed Time: 00:00:01.07)

c:\BTP\AESmult>aes testdata.txt test.key
Converting input file to ASCII-coded hexadecimal...
...Hexadecimal Conversion Complete (Elapsed Time: 00:00:00.17)
Beginning AES Encryption with 24 PIDs...
...AES Encryption Complete (Elapsed Time: 00:00:00.50)
So off the bat it's apparent that you're code takes about 0.7s to run vs. mine taking about 1s. You shaved about 30% of the time off, which is impressive! Of course even when running "1 process" mine has a ton of overhead due to the PID handler, so it's not quite apples-to-apples. Either way, great job! Allowing mine to use more than 1 process it breaks the file into two 128-bit chunks and the time is cut in half, which is exactly what you'd expect. So if you add multi-processing to yours you'll have me beat for sure. That is, as long as your process handler isn't as horribly inefficient as mine :D (more to come on that later)

Now the couple of problems I found. First of course are the "Missing Operand" messages, there were about 10 of them per 128-bit block being processed. I'm on Win7 64-bit for what it's worth. I haven't dug through your code yet to see where it's coming from, but I have to imagine printing that message is slowing down your times. If I get some time I'll see if I can locate it for you, it's obviously in a loop somewhere so hopefully that narrows it down.

Now the second problem, our two scripts did not give the same output file :!: Here's what I got encrypting the same text file with the same key:

Code: Select all

aa-aes-result:
ec8a50eac61a5831d9bdc307fa566333
0adcca242f842b9e49de91e6d396f15f

aes-result:
14DE589684A386FA96EB99A8CA849507
B019B8522956B08EB907148108635684
I hate to say it, but I've been very careful about testing my algorithm against the FIPS standard after every enhancement I make to be sure not to break anything, so I have to assume that it's your algorithm producing the incorrect result? I'm going to look foolish if I turn out to be wrong there, but otherwise I think you have a glitch. As with the other one, I'll try to root this out for you when I have some free time. I can probably compare each of our round functions against each other and same with the key schedulers to see where one of us gives a different answer from the FIPS doc. There are test keys, test data, and example sequences in the back of the document you can walk through for that purpose.

Moving on to a second test just for the heck of it, I encrypted a 10KB file with both of our programs:

Code: Select all

 8:48:30:xx - Start conversion of file to hexadecimal digits
 x:xx:xx.xx - End conversion

 x:xx:xx.xx - Start encryption of hexadecimal file
Missing operand.
Missing operand.
Missing operand.
   [many more 'missing operand' messages removed for brevity]
Missing operand.
Missing operand.
 8:52:10.17 - End encryption

c:\BTP\AESmult>aes 10kdata.bin test.key 1
Converting input file to ASCII-coded hexadecimal...
...Hexadecimal Conversion Complete (Elapsed Time: 00:00:09.30)
Beginning AES Encryption with 1 PIDs...
...AES Encryption Complete (Elapsed Time: 00:06:58.24)

c:\BTP\AESmult>aes 10kdata.bin test.key
Converting input file to ASCII-coded hexadecimal...
...Hexadecimal Conversion Complete (Elapsed Time: 00:00:09.27)
Beginning AES Encryption with 24 PIDs...
...AES Encryption Complete (Elapsed Time: 00:00:52.72)
The biggest takeaway here is just how terribly inefficient my code is with only 1 process running. While yours was running it used between 64-95% of one CPU core, averaging ~85%. Mine on the other hand was showing light activity across 7 cores, just to run the one encryption process, and eating about 24% of my 12-thread Xeon :!: . Even still, mine took almost twice as long to finish in this mode, pathetic. Obviously mine is not written for single-process execution and there is WAAAY too much overhead for this in the PID handler routine. I did some napkin math on how long mine would take to run in a purely single-threaded mode, with no "handler" code at all, and it came out right around 4 minutes. So I think in that sense we're somewhat even (you'd have ~10% advantage?), it's just that I made a huge efficiency sacrifice on single threads to allow for multiple threads. As shown, when my program was allowed to use more processes, it ran ~7x faster, going from half as fast as yours to >3x faster. It would definitely be nice to improve my handler because I can see on a low core count machine (even a quad) it's going to eat too much CPU to be helpful...

In any case I think you've done a great job here, particularly banging this out in only a weekend. I saw some really clever techniques in your code that I'll probably borrow to improve my own. It's going to be fun timing individual FOR loops to see where your or my implementation might be faster, and use the better performer. I'd also really like to experiment with breaking the file into pieces for processing and recombining at the end. I considered that approach early on but assumed the time spent breaking up the file and recombining it would be longer than the overhead imposed by a process handler. Seeing now just how bad my handler is I might have been wrong there :D

Magialisk
Posts: 104
Joined: 25 Jul 2013 19:00

Re: Multi-process Advanced Encryption Standard (AES)

#30 Post by Magialisk » 11 Nov 2013 13:24

Antonio,
I found a bug that is causing your code to generate improper values. In "round 0" (before you start the actual encryption rounds) you need to XOR the input key with the input data. I used your built-in trace routines and compared to my built-in debug routines and I can attest that our key schedulers are giving identical output (I only tried 128-bit keys).

After the scheduler, you display round(0) data and round(0) key, but you forgot to XOR them together prior to beginning round(1). Your round(1).start shows the same data as your round(0).input. Example:

Code: Select all

c:\BTP\AESmult>aa-aes testdata.txt 00112233445566778899aabbccddeeff
     [key expansion output removed - confirmed working]

10:55:30.55 - Start conversion of file to hexadecimal digits
10:55:30.76 - End conversion

10:55:30.76 - Start encryption of hexadecimal file
round[0].input 30303131323233333434353536363737
Missing operand.
Missing operand.
Missing operand.
Missing operand.
Missing operand.
Missing operand.     <-- I haven't looked for what's causing these yet
Missing operand.         Thought I'd look for the algorithm bug first...
Missing operand.
Missing operand.
Missing operand.
Missing operand.
Missing operand.
Missing operand.
Missing operand.
Missing operand.
Missing operand.
round[0].k_sch 00112233445566778899aabbccddeeff

round[1].start 30303131323233333434353536363737   <--that's a problem
round[1].s_row 0423969a23189ac71805c7c30504c396
round[1].m_col ad4091eed70d147137762ea393ed2bb4
round[1].k_sch c0393478846c520f0cf5f8b4c028164b
Here's the debug output from my old original code for comparison:

Code: Select all

c:\BTP\AES>aes enc data key
input data: 30303131323233333434353536363737
Round 0 key: 00112233445566778899AABBCCDDEEFF

Expanded round keys:
     [removed, same as yours]

Initial AddRoundKey result: 3021130276675544BCAD9F8EFAEBD9C8  <-- This is the step you're missing
                                                                  XOR before starting round 1
Begin round #1:
SubBytes result:    04FD7D773885FC1B6595DB192DE935E8
ShiftRows result:   0485DBE83895357765E97D1B2DFDFC19
MixColumns result:  AF8B0F9996215E068C305B0DA3CA1844
Round key:          C0393478846C520F0CF5F8B4C028164B  <-- Notice key schedule is the same
AddRoundKey result: 6FB23BE1124D0C0980C5A3B963E20E0F

I didn't dig in much deeper yet, but on a cursory examination it looks like your combination of SubBytes() and ShiftRows() is working fine. If you look at the first two bytes of input '30h' it gets shifted by 0 and then substituted with '04', so we both agree on that. I spot checked another value or two and it looked OK. Unfortunately, trying to figure out whether MixColumns() is working is futile when it's being fed incorrect input. I'm out for a while but I'll take another look at your code for you a little later. Hopefully this is the only flaw, as once yours is demonstrated to give the right answer I want to dig in and steal some clever bits :D

Post Reply