Efficient Array Management Operations in Batch files

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Efficient Array Management Operations in Batch files

#1 Post by Aacini » 22 Sep 2021 22:08

In this thread a different way to manipulate Arrays (written as N-Tuples) in a Batch file is presented. A Vector is a list of elements of same type separated by one space each. A Matrix is comprised of a series of vectors (rows) stored each one in an individual variable with same name and a 1..9 digit in the last character. Rows in a Matrix could be of different type or number of elements (columns), but they all must have the same number of columns in order to be operated via the numeric Matrix subroutines.

The Efficient Array Management subroutines performs operations over all Vector elements via a single long command line, so they don't use slow FOR loops. The Arrays are defined and manipulated in an easy way that don't impose complicated constraints, so the diverse subroutines are simple to understand and use.
  1. Creation of Arrays

    The simplest way to create Vectors and Matrices is directly writting they in SET commands. For example:

    Code: Select all

    set "NumericVector=1 1 2 3 5 8"
    set "NamesVector=John Paul George Ringo"
    set "Matrix1=10 20 30"
    set "Matrix2=110 120 130"
    set "Matrix3=501 502 503"
    set "Matrix4=700 800 900"
    
    Please be aware that the separation of elements in a N-Tuple must be done precisely by one space in order for the manipulation subroutines to work properly. Note that mathematical subroutines does not check that a Vector or Matrix be well formed, so if the operands for these subroutines were not created correctly, the program just will fail.

    The generation of Vectors with constants or incrementing indices is straigthforward; it is shown here just for reference:

    Code: Select all

    set "zeros=" & (for /L %%i in (1,1,%n%) do set "zeros=!zeros! 0") & set "zeros=!zeros:~1!"
    
    set "index=" & (for /L %%i in (1,1,%m%) do set "index=!index! %%i") & set "index=!index:~1!"
    
    rem Previous vector can also be created in this simpler way:
    call :indexGen index=%m%
    
    for /F "delims==" %%v in ('set identity 2^>nul') do set "%%v="
    for /L %%i in (1,1,4) do (
       for /L %%j in (1,1,4) do (
          set /A "n=^!(%%i-%%j)"
          set "identity%%i=!identity%%i! !n!"
       )
       set "identity%%i=!identity%%i:~1!"
    )
    
    In all management subroutines a vector/matrix parameter is written as the name of a variable or as a vector constant enclosed in quotes; in most subroutines the second vector/matrix can also be a single (scalar) number enclosed in quotes.

    The subroutines have not a setlocal command in order to make they faster; all internal subroutine variables are single-letter ones, so if you avoid the use of one letter variable names, your variables and subroutine variables will not conflict.


    .
  2. Standard Operations on Vectors

    The :vectorOp subroutine performs a series of operations over one or two vectors.

    Code: Select all

    call :vectorOp result=vector1 oper [vector2 | "number"]
    
    The oper can be any SET /A two-operands operator: * / % + - << >> & ^ |, but you must write the percent (modulo) operator 4 times; enclose poison characters in quotes as usual. Additionally, you can use Max operator for the Greater of the two operands and Min for the Lesser one. You can also use indexOf operator that returns the position of the elements of second operand into the first vector, or 0 if not exists. For example:

    Code: Select all

    call :vectorOp resul="2 3 4 5" + "7 8 9 10"
    echo resul = %resul%
    resul = 9 11 13 15
    
    If the second operand is not given, the one-operand operator is applied to each vector element: ! ~ -:

    Code: Select all

    call :vectorOp resul="7 -8 9 -10" -
    echo resul = %resul%
    resul = -7 8 -9 10
    

    .
  3. Special Operations

    Bracket Indexing.

    If the operator is [] (called Bracket Indexing) the elements are selected via the indices vector placed in the second operand:

    Code: Select all

    call :vectorOp result="22 93 5 10 3" [] "1 4 5"
    echo %result%
    22 10 3
    
    Note that index elements must be sorted in ascending order.

    Reduction and Scan.

    If the second operand is "//" (called Reduction, quotes are mandatory!), then the operator is applied to all elements of the vector. For example, to add up a vector of numbers:

    Code: Select all

    call :vectorOp result="22 93 5 10 3" + "//"
    echo %result%
    133
    
    The "\\" (called Scan) is similar except that it works cumulatively on the data, and gives all the intermediate results:

    Code: Select all

    call :vectorOp result="22 93 5 10 3" + "\\"
    echo %result%
    22 115 120 130 133
    
    Compress and Expand.

    These Special Operations can also be placed in the operator's position. In this case the // is called Compress and allows to select some elements from the vector according to the "zero or one" (Boolean) values given in the second operand:

    Code: Select all

    call :vectorOp result="22 93 5 10 3" // "1 0 1 0 1"
    echo %result%
    22 5 3
    
    Conversely, \\ (called Expand) will insert fill data into vectors. Elements with zero are inserted at the positions indicated by the 0's:

    Code: Select all

    call :vectorOp result="10 20 30" \\ "1 0 1 1 0"
    echo %result%
    10 0 20 30 0
    
    Outer and Inner Products (dot operator).

    If the operator is the grad-dot-oper combination (called Outer Product, like in °.+ with plus oper as example), then the operator is applied to all combinations of elements in the two vectors. The result is a two-dimensional matrix with the number of rows (first dimension) of the first operand and the number of columns (second dimension) of the second operand. For example:

    Code: Select all

    call :vectorOp result="2 4 6" °.+ "10 20 30 40"
    set result
    result1=12 22 32 42
    result2=14 24 34 44
    result3=16 26 36 46
    for /F "tokens=1* delims==" %%a in ('set result') do echo %%b
    12 22 32 42
    14 24 34 44
    16 26 36 46
    
    The Inner Product allows two operators to be applied to the elements, like in +.*. In this case both operands must be two-dimensional arrays and the number of columns in the first matrix must be equal to the number of rows in the second matrix. First each row of the left operand is applied to each column of the right operand using the rightmost operator of the inner product, then the leftmost operator is applied to the intermediate results, like in a Reduction (//) operation.

    In this way, the +.* inner product performs the same task of the standard matrix multiplication, but you can use anyone of the SET /A two-operands operators at each side of the dot, including the max and min additional operators. Enclose poison characters in quotes, as usual. For example, min.max inner product first get the Max value of the couples of elements matching left-rows with right-columns. After that, it gets the Min value of all previous intermediate values and place it at the corresponding crossing coordinates of the result matrix.

    Note that the Inner Product is an operation over matrices, not vectors, so it can not be used in :vectorOp subroutine, but in the :matrixOp one described next.
===============================================================================

Neither the Inner nor Outer Product operations are yet implemented in this first version of this application. The code is here:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

rem Efficient Array Management Operations in Batch files
rem Antonio Perez Ayala - 2021/Sep v.1

rem Efficient Array Management subroutines are placed below the equal-signs line
rem You must extract such a subroutines and include they in your own program
rem
rem The code below are simple examples of the use of those subroutines
rem and a session for the interactive execution of lines of code

for /F "delims==" %%v in ('set a[ 2^>nul') do set "%%v="

cls
echo/
echo         Efficient Array Management Operations in Batch files
echo/

:exam

set "letter=A B C D E F G H I J K L M N O P Q R S T U V W X Y Z"
echo letter = %letter%
echo/

echo call :vectorOp vowel=letter [] "1 5 9 15 21"
call :vectorOp vowel=letter [] "1 5 9 15 21"
echo vowel = %vowel%
echo/

echo call :vectorOp resul=letter indexOf vowel
call :vectorOp resul=letter indexOf vowel
echo resul = %resul%
echo/
echo -------------------------------------------
echo/

echo call :vectorOp resul="2 3 4 5" + "7 8 9 10"
call :vectorOp resul="2 3 4 5" + "7 8 9 10"
echo resul = %resul%
echo/

echo call :vectorOp resul="7 8 9 10" + "23"
call :vectorOp resul="7 8 9 10" + "23"
echo resul = %resul%
echo/

echo call :vectorOp resul="20 3 40 5" Max "7 8 9 10"
call :vectorOp resul="20 3 40 5" Max "7 8 9 10"
echo resul = %resul%
echo/

echo call :vectorOp resul="7 -8 9 -10" -
call :vectorOp resul="7 -8 9 -10" -
echo resul = %resul%
echo/

echo Enter command lines for interactive execution of these subroutines
echo Use delayed expansion to show ^^^!variable^^^! values
echo Enter HELP for help, EXAM to re-show examples or just press Enter to end

:nextCommand
echo/
echo Ready...
set "command="
set /P "command="
if not defined command goto :EOF
if /I "!command!" equ "EXAM" goto exam
if /I "!command!" equ "HELP" goto help

%command%

goto nextCommand

:help
echo/
echo Define Vectors as a list of elements separated by *one* space:
echo    set "NumericVector=1 1 2 3 5 8"
echo    set "NamesVector=John Paul George Ringo"
echo    call :indexGen index=4                  %%= Same as: set "index=1 2 3 4" =%%
echo/
echo :vectorOp subroutine performs operations over one or two vectors:
echo    call :vectorOp result=vector1 oper [vector2 ^| "number"]
echo/
echo ^<oper^> can be anyone of: * / %% + - ^<^< ^>^> ^& ^^ ^|      write %% 4 times, or:
echo    Max     - Get the Greater of two operands
echo    Min     - Get the Lesser of two operands
echo    indexOf - Get the indices of elements of vector2 inside vector1
echo    []      - (Bracket Indexing): Get elements selected by indices in vector2
echo    //      - (Compress): Get elements selected by (0,1) values in vector2
echo    \\      - (Expand): Insert fill data in positions indicated by 0 in vector2
echo/
echo If "//" is placed (with quotes) *in place of vector2* (Reduction),
echo then the oper is applied to all elements in the vector:
echo    call :vectorOp result="1 2 3 4 5 6" * "//"     %%= Factorial of 6 =%%
echo/
echo If "\\" is used instead (Scan), get all intermediate reduction results:
echo    call :vectorOp result="1 2 3 4 5 6" * "\\"     %%= All factorials of 1..6 =%%

goto nextCommand

=============================================================================

These are the Efficient Array Management subroutines
Copy this section to your own program

Antonio Perez Ayala
2021/Sep - version 1

:indexGen result=n
set "%1=" & (for /L %%i in (1,1,%~2) do set "%1=!%1! %%i") & set "%1=!%1:~1!"
exit /B


:vectorOp result=operand1 operator [operand2]
set "a=%2"
if %a:~0,1%%a:~-1% == "" (set "a=%~2") else set "a=!%~2!"
set "o=%~3" & set "r=" & set "s=" & set "p=%%" & set "b=%4"
if not defined b goto vectorOp1
if %b:~0,1%%b:~-1% == "" (set "b=%~4") else set "b=!%~4!"
if "%o%" == "[]" goto vectorOpBracket
if "%o%" == "//" goto vectorOpCompress
if "%o%" == "\\" goto vectorOpExpand
if /I "%o%" == "indexOf" goto vectorOpIndexOf
if /I "%o%" == "Max" set "o=x-z" & goto vectorOpMaxMin
if /I "%o%" == "Min" set "o=z-x" & goto vectorOpMaxMin
if "%b%" == "//" goto vectorOpReduction
if "%b%" == "\\" goto vectorOpScan

:vectorOpStd
set "x=%a: =" & set "y=!b:* =!" & call set "z=!p!b: !y!=!p!" & set /A "z=x!o!z" & set "r=!r! !z!" & set "b=!y!" & set "x=%" & set /A "z=x!o!y" & set "%1=!r:~1! !z!"
exit /B

:vectorOp1
set "x=%a: =" & set /A "z=!o!x" & set "r=!r! !z!" & set "x=%" & set /A "z=!o!x" & set "%1=!r:~1! !z!"
exit /B

:vectorOpBracket
set "n=0" & set "x=%a: =" & set "y=!b:* =!" & call set "m=!p!b: !y!=!p!" & set /A n+=1 & (if !n! equ !m! set "r=!r! !x!" & set "b=!y!") & set "x=%" & set "%1=!r:~1!"
exit /B

:vectorOpCompress
set "a=%b%" & set "b=%a%"
set "x=%a: =" & set "y=!b:* =!" & call set "z=!p!b: !y!=!p!" & (if "!x!" == "1" set "r=!r! !z!") & set "b=!y!" & set "x=%" & (if "!x!" == "1" set "r=!r! !y!") & set "%1=!r:~1!"
exit /B

:vectorOpExpand
set "a=%b%" & set "b=%a%"
set "x=%a: =" & set "y=!b:* =!" & call set "z=!p!b: !y!=!p!" & (if "!x!" == "1" (set "r=!r! !z!" & set "b=!y!" ) else set "r=!r! 0") & set "x=%" & (if "!x!" == "1" (set "r=!r! !y!") else set "r=!r! 0") & set "%1=!r:~1!"
exit /B

:vectorOpIndexOf
setlocal EnableDelayedExpansion & rem Special case
set "i=0" & set "x=%a: =" & set /A "i+=1,a[!x!]=i" & set "x=%" & set /A "i+=1,a[!x!]=i"
set "x=%b: =" & set /A "y=a[!x!]" & set "r=!r! !y!" & set "x=%" & set /A "y=a[!x!]"
endlocal & set "%1=%r:~1% %y%"
exit /B

:vectorOpMaxMin
set "x=%a: =" & set "y=!b:* =!" & call set "z=!p!b: !y!=!p!" & set /A "s=((!o!)>>31)+1,z=x*s+z*^!s" & set "r=!r! !z!" & set "b=!y!" & set "x=%" & set /A "z=y,s=((!o!)>>31)+1,z=x*s+z*^!s" & set "%1=!r:~1! !z!"
exit /B

:vectorOpScan
set "s=1"
:vectorOpReduction
if "%o%" neq "*" if "%o%" neq "<<" goto notMul
set "z=1" & goto @@
:notMul
if "%o%" neq "+" if "%o%" neq "-" if "%o%" neq "^" if "%o%" neq "|" goto notAdd
set "z=0" & goto @@
:notAdd
if "%o%" equ ">>" set /A "z=0x40000000" & goto @@
if /I "%o%" equ "Max" set /A "z=0x80000000" & goto @@
set /A "z=0x7FFFFFFF"
:@@
set "x=%a: =" & set /A "z!o!=x" & (if defined s set "r=!r! !z!") & set "x=%" & set /A "z!o!=x" & if defined s (set "%1=!r:~1! !z!") else set "%1=!z!"
exit /B

rem Some details missing in Reduction/Scan:
rem The implementation of Max & Min operators is missing
rem The initial values in some cases are not the best ones

rem In some routines a single Scalar number in vector2 don't works as intended
Please, note that this is work in progress. There are some minor details in a couple subroutines that should be fixed, like in indexOf, "//" and "\\" operators, and also several points that must be designed and then programmed, like some minor vector subroutines. Three major areas of development are subroutines for string (character) management, all matrix subroutines management (including matrix inversion) and the possibility of perform floating point arithmetic operations via resident JScript code, like in other projects.

However, I don't have much motivation to do such a thing because some times my work goes unnoticed... However, if an enough number of users reply saying that they are using the package, report bugs and request modificactions, then I could change my mind and continue with the development of this application.

Antonio

PS - If you have interest in this program, you can review additional information at these sites:

- The method used to perform operations on several vector elements in a single line is described with detail at this answer.

- The data and operative scheme to manage arrays was taken from APL programming language.

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: split string into substrings based on delimiter

#2 Post by Aacini » 27 Sep 2021 21:47

I developed a couple new methods to simultaneously process two or more variables in a more efficient way. Let's start with this example:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

set "names=A B C D E F G H I J K L M N O P Q R S T U V W X Y Z "
set "values=1851174 310228712 23097674 165929174 945915997 23855547 206119680 110018783 280218208 23864878 963715423 33967149 271921049 72951533 171526681 188716352 893510509 189019180 26048663 214410110 233512550 909623440 83224965 145018318 24411586 214715718 "

rem Standard method to assign each value to each variable name
set "p=%%"
set "x=%names: =" & set "y=!values:* =!" & call set "var!x!=!p!values: !y!=!p!" & set "values=!y!" & set "x=%"

set var
The problem here is that the values variable is very long and the frequent assignment of its parts is not efficient. I wonder if I could separate variable values in parts without use !values:* =! and call !p!values: !y!=!p! slow !delayed! expansion substrings. The only method I could think of is a FOR /F command nested inside each expansion of the other variable. This is the first attempt:

Code: Select all

set "x=%names: =" & for /F "tokens=1*" %%a in ("!values!") do (set "var!x!=%%a" & set "values=%%b") & set "x=%"
However, this method fails because the percent char in %%a FOR parameter breaks the original expansion of x=%names variable.

I think of replace the %% part by !p!, that is:

Code: Select all

set "p=%%"
set "x=%names: =" & for /F "tokens=1*" !p!a in ("!values!") do (set "var!x!=!p!a" & set "values=!p!b") & set "x=%"
But the parsing of FOR command happens before !delayed! expansion, so the !p! part is not a valid syntax in FOR command.

The problem is this: although a combination of !delayed! expansion and other tricks could sucessfully generate a series of valid commands, such a commands are completed until the execution of !delayed! expansion, but the parsing of the FOR command happens before that...

If we carefully read previous paragraph, then we find a simple and obvious solution: first use !delayed! expansion to generate the series of commands, but do not execute they! Just store they in a variable. Then, in the next line, execute the contents of such a variable.

Of course, because in this case the commands are not executed, but stored, it is necessary to caret-escape certain special characters:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

set "names=A B C D E F G H I J K L M N O P Q R S T U V W X Y Z "
set "values=1851174 310228712 23097674 165929174 945915997 23855547 206119680 110018783 280218208 23864878 963715423 33967149 271921049 72951533 171526681 188716352 893510509 189019180 26048663 214410110 233512550 909623440 83224965 145018318 24411586 214715718 "

rem New method replicating FOR /F commands

rem Assemble the command line by including FOR /F commands
set "p=%%"
SET LINE=set "x=%names: =" ^& for /F "tokens=1*" !p!a in ("^!values^!") do (set "var^!x^!=!p!a" ^& set "values=!p!b") ^& set "x=%"

rem ... and run it
%LINE%

set var
A simple repeating timing test show that test1.bat takes 28 seconds, but test2.bat takes just 12 seconds. Less than the half! :)


This second method lead me to a new idea: if we can use a FOR /F command, then we could use it to access several elements at once (perhaps all of them), instead of just the "first" and the "rest" in each "iteration". If we can access all the elements at once, we could assemble a long line of assignments that would be processed in a single long SET /A command!

Although this method is more complicated than the formers, the number of executed comands would be much lesser, so the execution should be faster, specially if large amounts of data are processed. This is the implementation of the last method:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

rem New method using just a couple FOR /F commands and a long SET /A command
rem Antonio Perez Ayala

rem Initialization part: can be performed just once
set  "tokensNames=? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] "
set "tokensValues=_ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } "

rem Use the previous new method to process tokensNames and tokensValues string pairs
rem to create a LINE variable that will generate assignment pairs
set "p=%%"
SET LINE=set "SETS=" ^& set "x=%tokensNames: =" ^& for /F "tokens=1*" !p!a in ("^!TokensValues^!") do (set "SETS=^!SETS^!var!p!^!x^!=!p!!p!a," ^& set "TokensValues=!p!b") ^& set "x=%"

rem Run created LINE variable to generate SETS variable containing %%A=%%a,%%B=%%b,... FOR /F tokens assignments pairs
%LINE%

rem You can use SETS variable from now on

rem =====================================

rem Process data part

rem "names" variable must have an additional "| " element at end
set "names=A B C D E F G H I J K L M N O P Q R S T U V W X Y Z | "
set "values=1851174 310228712 23097674 165929174 945915997 23855547 206119680 110018783 280218208 23864878 963715423 33967149 271921049 72951533 171526681 188716352 893510509 189019180 26048663 214410110 233512550 909623440 83224965 145018318 24411586 214715718 "

rem Match names vs. values and assemble these assignments pairs: varA=1851174,varB=310228712,...,varZ=214715718,var|=,
for /F "tokens=1-31" %%? in ("%names%") do for /F "tokens=1-31" %%_ in ("%values%") do set "VALS=%SETS%"

rem Execute the assignments
set /A "%VALS:,var|=" & REM "%"

set var
Note that this method works over a maximum of 30 elements (FOR /F tokens) only. Previous methods works on an unlimited number of elements, as long as the resulting command line is no longer than 8191 characters. If necessary, this method could be extended to an unlimited number of tokens.

When repeating the whole test3.bat, the timing test take 18 seconds. However, when the initialization part was performed just once and then the process data part was repeated, the timing test was just 5.35 seconds! :shock: Less than 1/5 of the original time! :D

Antonio

Post Reply