When I created this topic there was a problem: the characters in the 128..255 extended range does not follow the standard sequence when they are used as FOR /F tokens. Besides, the correct sequence was difficult to find and was different for each code page. Posterior and interesting replies explained the problem and shown possible methods to solve it, but doing that still required a lot of testing and work. Finally, the solution given by Dave was very simple and provided a way to use much more additional tokens than the originally planned ones. Using such a solution I could complete my application.
The purpose of my program is assemble a method that allows to process a text file via a FOR /F command using many tokens, as much as possible, but in a simple way. Note that this application is not just a technicall curiosity: perhaps a practical use would not be to process hundreds or thousands of tokens in each line of a very large file, but just process a few number of tokens taking them from a large file that may contain hundreds of values in each line, like a very large spreadsheet. This is the new version of my application:
Code: Select all
@echo off
setlocal EnableDelayedExpansion
rem MakeForTokens.bat application written by Antonio Perez Ayala
if "%~1" neq "" if "%~1" neq "/?" goto begin
:usage
echo/
echo Create ForTokens.bat file as the skeleton of a program that supports a very
echo large number of tokens via a series of nested FOR /F commands.
echo/
echo MakeForTokens.bat numTokens
echo/
echo The number of tokens must be between 32 and 4094.
echo/
echo After the ForTokens.bat file is created, you must rename and modify it to suit
echo your needs; the program include full descriptions of the required changes.
echo/
echo If you give 300 in the number of tokens, the generated ForTokens.bat program
echo may run immediately over an example data file that is created.
goto :EOF
:begin
set /A "numTokens=0, numTokens=%~1, numTokensP1=numTokens+1" 2>NUL
if %numTokens% lss 32 goto usage
if %numTokens% gtr 4094 goto usage
for /F "delims=:" %%a in ('findstr /N /B ":Header" "%~F0"') do set "header=%%a"
< "%~F0" (
echo @echo off ^& setlocal EnableDelayedExpansion ^& set "$numTokens=%numTokensP1%"
for /L %%i in (1,1,%header%) do set /P "="
findstr "^"
) > ForTokens.bat
echo ForTokens.bat file created with support for %numTokens% tokens
goto :EOF
rem The following section define the contents of the ForTokens.bat file
:Header
Rem/For This is a base program that process a file via FOR /F command with up to $numTokens tokens
Rem/For This program was created using MakeForTokens.bat application written by Antonio Perez Ayala
Rem/For Step 0:
Rem/For In order to use this program, you should have a data file with many tokens to process.
Rem/For A simple data file with 305 tokens is created here, so the examples below works correctly.
( for /L %%i in (1,1,305) do set /P "={%%i}," ) < NUL > dataFile.txt
Rem/For Step 1:
Rem/For Define the series of auxiliary variables that will be used as FOR tokens.
Rem/For The subroutine use the value of $numTokens variable as input.
call :DefineForTokens
Rem/For Step 2: (optional, but recommended)
Rem/For Define an auxiliary variable that will contain the desired tokens when it is %expanded%.
Rem/For This variable is created from a string similar to the original FOR /F "tokens=x,y,m-n" one,
Rem/For but that allows larger token numbers, ranges in descending order and increment greater than 1,
Rem/For and it also returns the number of created tokens. See full description later.
Rem/For In the example below this variable is called "tokens" and it contains these tokens/elements:
Rem/For 10 28 29 30 31 32 170 167 164 161, and "tokens.len" variable is also created with 10.
Rem/For You may or may not enclose the tokens definition between quotes.
call :ExpandTokensString "tokens=10,28-32,170-161-3"
echo Definition: tokens=10,28-32,170-161-3 created %tokens.len% tokens
Rem/For Step 3: (optional)
Rem/For Define the variable with the "delims" value that will be used in the nested FOR's.
Rem/For This variable must be named "delims" and it contains *the definition* of
Rem/For the same part of the FOR command, including the "delims=" word itself.
Rem/For If you want no delims (default: TAB+space), delete "delims" variable: set "delims="
Rem/For If you want "delims=", define: set "delims=delims="
set "delims=delims=,"
Rem/For Step 4:
Rem/For Create the macro that contain the nested FOR's. This step must be performed after both
Rem/For FOR tokens and "delims" variables was defined and before enter to the main FOR /F command.
Rem/For The subroutine use the value of $numTokens variable as input.
call :CreateNestedFors
Rem/For Step 5:
Rem/For This is the main FOR /F command that process the file,
Rem/For there is one additional nested FOR /F command for each 31 tokens (or part).
Rem/For You must include the right filename in the next line:
for /F "usebackq tokens=1-31* %delims%" %%%$1% in ("dataFile.txt") do %NestedFors% (
Rem/For Step 6:
Rem/For Process the tokens. To just show they, use the "tokens" variable defined above:
echo Tokens: %tokens%
Rem/For You may process "tokens" values via a plain FOR command:
for %%a in (%tokens%) do echo Token via FOR: %%a
Rem/For ... or via another FOR /F command:
for /F "tokens=1-%tokens.len%" %%a in ("%tokens%") do (
echo Token #1 of FOR /F: %%a
echo Token #6 of FOR /F: %%f
echo Token #9 of FOR /F: %%i
)
Rem/For You may also directly use the auxiliary "$#" tokens variables. See description below.
echo Token #242 via its token variable: %%%$242%
echo Full path of token #273: %%~F%$273%
Rem/For If there are additional tokens after the $numTokens number used to create this file,
Rem/For they will be grouped into the next token. For example, if this file was created via
Rem/For MakeForTokens.bat 300, then you may show the tokens beyond 300 this way:
echo Additional tokens after the #300: %%%$301%
Rem/For Closing parenthesis of the main FOR /F command
)
goto :EOF
Support subroutines. You must not modify any code below this line,
but all these explanations may be removed.
The next subroutine define the auxiliary variables that are used to access FOR /F tokens.
These variables are *called* $1, $2, etc., so %$43% is *the token* number 43, and %%%$43% is
*the value* of such a token when this construct is placed inside the FOR /F command.
The usual FOR modifiers may be used: %%~NX%$1284% expands to name and extension of token 1284.
The method to create these variables was originally written by DosTips.com user dbenham. See:
http://www.dostips.com/forum/viewtopic.php?f=3&t=7703&p=51595#p51595
This subroutine does not use SETLOCAL. It modify and delete these variables: _cp, _hex, _pages
and use the value of $numTokens variable as input.
:DefineForTokens
for /F "tokens=2 delims=:." %%p in ('chcp') do set /A "_cp=%%p, _pages=($numTokens/256+1)*2"
set "_hex= 0 1 2 3 4 5 6 7 8 9 A B C D E F"
call set "_pages=%%_hex:~0,%_pages%%%"
if %$numTokens% gtr 2048 echo Creating FOR tokens variables, please wait . . .
(
echo FF FE
for %%P in (%_pages%) do for %%A in (%_hex%) do for %%B in (%_hex%) do echo %%A%%B 3%%P 0D 00 0A 00
) > "%temp%\forTokens.hex.txt"
certutil.exe -decodehex -f "%temp%\forTokens.hex.txt" "%temp%\forTokens.utf-16le.bom.txt" >NUL
chcp 65001 >NUL
type "%temp%\forTokens.utf-16le.bom.txt" > "%temp%\forTokens.utf8.txt"
(for /L %%N in (0,1,%$numTokens%) do set /P "$%%N=") < "%temp%\forTokens.utf8.txt"
chcp %_cp% >NUL
del "%temp%\forTokens.*.txt"
for %%v in (_cp _hex _pages) do set "%%v="
exit /B
The next subroutine create the series of nested FOR's that covers all required FOR tokens;
it use the value of $numTokens variable as input.
:CreateNestedFors
setlocal EnableDelayedExpansion
set /A "numTokens=$numTokens-1, mod=numTokens%%31, i=numTokens/31, lim=31"
if %mod% equ 0 set "mod=31"
set "NestedFors="
for /L %%i in (32,31,%numTokens%) do (
if !i! equ 1 set "lim=!mod!"
set "NestedFors=!NestedFors! for /F "tokens=1-!lim!* %delims%" %%!$%%i! in ("%%!$%%i!") do"
set /A "i-=1"
)
for /F "delims=" %%a in ("!NestedFors!") do endlocal & set "NestedFors=%%a"
exit /B
The next subroutine expands a tokens definition string into a series of individual $# tokens variables.
The tokens definition string have the same form of the standard FOR /F "tokens=x,y,m-n" one.
Additionally, you may define a tokens range in descending order: "tokens=10-6" produce 10 9 8 7 6
or use an increment different than 1: "tokens=10-25+5,400-200-100" produce 10 15 20 25 400 300 200
When the subroutine ends, the total number of tokens created is stored in a global variable
with the same name of the tokens one, plus ".len" added at end.
:ExpandTokensString variable=tokens definitions ...
setlocal EnableDelayedExpansion
set "var=" & set "tokens=" & set "len=0"
if "%~2" equ "" (set "params=%~1") else set "params=%*"
for %%a in (!params!) do (
if not defined var (
set "var=%%a"
) else for /F "tokens=1-3 delims=-+" %%i in ("%%a") do (
if "%%j" equ "" (
if %%i lss %$numTokens% set "tokens=!tokens! %%!$%%i!" & set /A len+=1
) else (
if "%%k" equ "" (set "k=1") else set "k=%%k"
if %%i leq %%j (
for /L %%n in (%%i,!k!,%%j) do if %%n lss %$numTokens% set "tokens=!tokens! %%!$%%n!" & set /A len+=1
) else (
for /L %%n in (%%i,-!k!,%%j) do if %%n lss %$numTokens% set "tokens=!tokens! %%!$%%n!" & set /A len+=1
)
)
)
)
endlocal & set "%var%=%tokens%" & set "%var%.len=%len%"
exit /B
The way to use this application is executing it with the desired number of tokens in the parameter, for example:
MakeForTokens.bat 300. The application creates a Batch file named ForTokens.bat that contains the code, subroutines and values needed to access such amount of tokens, so the users just needs to insert their own details in the code in order to get a working program. The examples shown in the code are based on a data file with 300 tokens that is created at beginning, so if you create ForTokens.bat with 300 tokens, it can run immediately with no modification. This is the output of such a program:
Code: Select all
Definition: tokens=10,28-32,170-161-3 created 10 tokens
Tokens: {10} {28} {29} {30} {31} {32} {170} {167} {164} {161}
Token via FOR: {10}
Token via FOR: {28}
Token via FOR: {29}
Token via FOR: {30}
Token via FOR: {31}
Token via FOR: {32}
Token via FOR: {170}
Token via FOR: {167}
Token via FOR: {164}
Token via FOR: {161}
Token #1 of FOR /F: {10}
Token #6 of FOR /F: {32}
Token #9 of FOR /F: {164}
Token #242 via its token variable: {242}
Full path of token #273: C:\Users\Antonio\Documents\tests\{273}
Additional tokens after the #300: {301},{302},{303},{304},{305},
If you want that the generated ForTokens.bat program process a different number of tokens, you may modify the value in the first line; just remember that such a value must be
one token more than the base number. The additional token is used to store the additional values that may appear in the file lines after the base number of tokens was processed.
The method implemented in this program allows to process up to 4094 tokens, that is very close to the maximum limit of 4126 tokens that can be managed in a FOR /F command (as described in previous post). I successfully tested this program over a text file with 4100 tokens (4090 tokens of one character plus the numbers from 4091 to 4100), that is processed with the 4094 maximum number of tokens; the resulting code have 133 nested FOR /F commands and all of them include a "delims=," string. This is the code used to create the 4100 tokens data file, and the output of the program:
Code: Select all
(
for /L %%i in (1,1,4090) do set /P "=x,"
for /L %%i in (4091,1,4100) do set /P "=%%i,"
echo/
) < NUL > dataFile.txt
Output:
Definition: tokens=1000,2000,3000,4000,4090-4094 created 9 tokens
Tokens: x x x x x 4091 4092 4093 4094
Additional tokens after the #4094: 4095,4096,4097,4098,4099,4100,
However, this test is not complete because it does not confirm that
all the tokens in the data file are processed correctly, so I want a test that allow me to identify any token I wish. This point is important to me because I want not that the same unpleasant surprise happen one more time (I developed the first version of this application based on reports from three people that stated that
"all characters from 0x80 to 0xFF works correctly as FOR tokens, excepting 0xFF" ).
Unfortunately, using different tokens decrease its maximum number because unique tokens must be larger, so a compromise between token length and number of tokens must be made. I used the "Base 62" number system using "0123456789ABC...XYZabc...xyz" characters as digits in 0..61 range, that allows to write 62*62 = 3844 different values using two characters for each one. This range of values cover the 2771 possible tokens that can be processed when the tokens have two characters each. I wrote this example creating a ForTokens.bat file with the right number of tokens; then, I modified the code to create the data file with two-characters tokens and process the file converting each token from "Base 62" into the equivalent decimal value. Finally, I removed all comments from the file. This is the result:
Code: Select all
@echo off & setlocal EnableDelayedExpansion & set "$numTokens=2771"
rem Step 0: Create the data file with 2770 tokens comprised of two "Base 62" digits each:
echo Creating the data file, please wait . . .
set "base=0 1 2 3 4 5 6 7 8 9"
set "base=%base% A B C D E F G H I J K L M N O P Q R S T U V W X Y Z"
set "upCase=%base: =%"
set "i=0"
for %%a in (%base%) do set /A "base[%%a]=i, i+=1"
set "base=%base% a b c d e f g h i j k l m n o p q r s t u v w x y z"
set "tokens=2770"
(
set /P "=%base:~2% "
set /A tokens-=61
for %%A in (%base:~2%) do for %%B in (%base%) do if !tokens! gtr 0 (
set /P "=%%A%%B "
set /A tokens-=1
)
echo/
) < NUL > dataFile.txt
rem Step 1: Define the series of auxiliary variables that will be used as FOR tokens:
call :DefineForTokens
rem Step 3: Define the variable with the "delims" value that will be used in the nested FOR's:
set "delims="
rem Step 4: Create the macro that contain the nested FOR's:
rem This is done just once, before enter to the next loop
call :CreateNestedFors
rem Loop: Read tokens definitions from the user, expand they and process the resulting tokens
cls
echo/
echo Enter tokens definitions; the valid tokens numbers are in 1..2770 range.
echo/
echo You may insert a tokens range in ascending or descending order with an optional
echo increment/decrement different than 1. For example: tokens=10-6,87,100-500+100
:loop
echo/
set /P "tokens=tokens="
if errorlevel 1 goto :EOF
rem Step 2: Define the auxiliary variable that will contain the desired tokens:
call :ExpandTokensString "tokens=%tokens%"
echo %tokens.len% tokens defined
rem Step 5: This is the main FOR /F command that process the file:
for /F "usebackq tokens=1-31* %delims%" %%%$1% in ("dataFile.txt") do %NestedFors% (
rem Step 6: Process the tokens:
echo Original: %tokens%
set "line="
for %%a in (%tokens%) do (
set "token=%%a"
set "A=!token:~0,1!" & set "B=!token:~1,1!"
if not defined B set "B=!A!" & set "A=0"
set /A X=base[!A!], Y=base[!B!]
for /F %%c in ("!A!") do if "!upCase:%%c=%%c!" neq "%upCase%" set /A X+=26
for /F %%c in ("!B!") do if "!upCase:%%c=%%c!" neq "%upCase%" set /A Y+=26
set /A num=X*62+Y
set "line=!line! !num!
)
echo Decimal: !line!
)
goto :loop
:DefineForTokens
for /F "tokens=2 delims=:." %%p in ('chcp') do set /A "_cp=%%p, _pages=($numTokens/256+1)*2"
set "_hex= 0 1 2 3 4 5 6 7 8 9 A B C D E F"
call set "_pages=%%_hex:~0,%_pages%%%"
if %$numTokens% gtr 2048 echo Creating FOR tokens variables, please wait . . .
(
echo FF FE
for %%P in (%_pages%) do for %%A in (%_hex%) do for %%B in (%_hex%) do echo %%A%%B 3%%P 0D 00 0A 00
) > "%temp%\forTokens.hex.txt"
certutil.exe -decodehex -f "%temp%\forTokens.hex.txt" "%temp%\forTokens.utf-16le.bom.txt" >NUL
chcp 65001 >NUL
type "%temp%\forTokens.utf-16le.bom.txt" > "%temp%\forTokens.utf8.txt"
(for /L %%N in (0,1,%$numTokens%) do set /P "$%%N=") < "%temp%\forTokens.utf8.txt"
chcp %_cp% >NUL
del "%temp%\forTokens.*.txt"
for %%v in (_cp _hex _pages) do set "%%v="
exit /B
:CreateNestedFors
setlocal EnableDelayedExpansion
set /A "numTokens=$numTokens-1, mod=numTokens%%31, i=numTokens/31, lim=31"
if %mod% equ 0 set "mod=31"
set "NestedFors="
for /L %%i in (32,31,%numTokens%) do (
if !i! equ 1 set "lim=!mod!"
set "NestedFors=!NestedFors! for /F "tokens=1-!lim!* %delims%" %%!$%%i! in ("%%!$%%i!") do"
set /A "i-=1"
)
for /F "delims=" %%a in ("!NestedFors!") do endlocal & set "NestedFors=%%a"
exit /B
:ExpandTokensString variable=tokens definitions ...
setlocal EnableDelayedExpansion
set "var=" & set "tokens=" & set "len=0"
if "%~2" equ "" (set "params=%~1") else set "params=%*"
for %%a in (!params!) do (
if not defined var (
set "var=%%a"
) else for /F "tokens=1-3 delims=-+" %%i in ("%%a") do (
if "%%j" equ "" (
if %%i lss %$numTokens% set "tokens=!tokens! %%!$%%i!" & set /A len+=1
) else (
if "%%k" equ "" (set "k=1") else set "k=%%k"
if %%i leq %%j (
for /L %%n in (%%i,!k!,%%j) do if %%n lss %$numTokens% set "tokens=!tokens! %%!$%%n!" & set /A len+=1
) else (
for /L %%n in (%%i,-!k!,%%j) do if %%n lss %$numTokens% set "tokens=!tokens! %%!$%%n!" & set /A len+=1
)
)
)
)
endlocal & set "%var%=%tokens%" & set "%var%.len=%len%"
exit /B
Output example:
Code: Select all
Enter tokens definitions; the valid tokens numbers are in 1..2770 range.
You may insert a tokens range in ascending or descending order with an optional
increment/decrement different than 1. For example: tokens=10-6,87,100-500+100
tokens=10-6,87,100-500+100
11 tokens defined
Original: A 9 8 7 6 1P 1c 3E 4q 6S 84
Decimal: 10 9 8 7 6 87 100 200 300 400 500
tokens=500-2500+500
5 tokens defined
Original: 84 G8 OC WG eK
Decimal: 500 1000 1500 2000 2500
tokens=2760-2775
11 tokens defined
Original: iW iX iY iZ ia ib ic id ie if ig
Decimal: 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770
tokens=1,500,1000,2000,2500,2600,2700
7 tokens defined
Original: 1 84 G8 WG eK fw hY
Decimal: 1 500 1000 2000 2500 2600 2700
tokens=
Final note: this program fail when a file line have less tokens than the expected ones and the last nested FOR /F don't process any data. I'll try to solve this problem in the next version...
EDIT 2017-04-24: The third version of this application, that correctly process lines with variable number of tokens, is ready. You may download it from
this post.
Antonio