Using many "tokens=..." in FOR /F command in a simple way

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Using many "tokens=..." in FOR /F command in a simple way

#46 Post by dbenham » 19 Mar 2017 20:07

Thor wrote:Very nice batch file, I could test up to 1323 tokens correctly. :D
But more than that I could not see anything at the command prompt.
My originally testing is to test up to 2100 tokens, but I could not see anything at the command prompt, so I reduce up to a point of "1323" tokens then I could see it display correctly. But from 1324th token on I could not see anything from the command prompt. I don't know what went wrong or is it the limit?

I'm almost sure your input line has exceeded the 8191 byte limit.

I have successfully tested up to my macro limit of 2303 (I had mistakenly reported 2304, but that last token is reserved for the remainder of the line that is not parsed). But in order to fit that many tokens within a 8191 line length, I had to limit the width of almost all the tokens to 2 characters (+ 1 delimiter for each token).

2303*3 = 6969.

If you bump the average width to 4 (3 width value + 1 delimiter), then you are up to 9212, and you have blown the limit.


penpen wrote:So i've found four (or five) types of characters (tested on Windows 10, 32 bit, german, patches up to date only):
1) I couldn't use these ones for any "for/f"-variable:
- U+0000 "NULL",
- U+000B "LINE TABULATION",
- U+000C "FORM FEED (FF)",
- U+000D "CARRIAGE RETURN (CR)"

The problems with NULL and CARRIAGE RETURN are known. I don't see any hope of ever accessing those positions.

But 0x0B and 0x0C fail for you :?: :shock:

Can you please check them again? In my hands they fall into your category 2 - They cannot be used to define the FOR variable base, but they can be used as "automatic" variables. I don't even have to escape those characters when I access them as automatic variables.


Dave Benham

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Using many "tokens=..." in FOR /F command in a simple way

#47 Post by dbenham » 19 Mar 2017 21:06

My :printFor generated macro is limited in that it requires that all input lines have all (or nearly all) of the requested columns. If a line is missing enough tokens that the last FOR /F has nothing to parse, then no tokens are received at all :!:

So I've written two more functions, :defineVariantFor and :defineVariantForChars, that support lines with varying numbers of tokens.

In addition, I've fixed some edge case bugs that were in my :defineFor function.

Documentation is embedded at the top of all 4 functions.

Here is a test program that shows all the features and tests the limits of the 4 functions.

Code: Select all

@echo off
setlocal enableDelayedExpansion
cls

:: Define a test line with 305 tokens in the format "{1},{2},{3},...{305},"
set "ln305="
for /l %%N in (1 1 305) do set "ln305=!ln305!{%%N},"

:: Define a test line with 165 tokens in the format "{1},{2},{3},...{165},"
set "ln165="
for /l %%N in (1 1 165) do set "ln165=!ln165!{%%N},"

echo(
echo Try :defineFor macro looking for 300 tokens,
echo using a line that contains 305 tokens
echo(
call :defineFor forMacro A 300 ","
call :testForMacro ln305
echo(
echo -------------------------------------------------------

echo(
echo Try :defineFor macro looking for 300 tokens,
echo using a line that only contains 165 tokens
echo(
call :testForMacro ln165
echo(
echo -------------------------------------------------------

echo(
echo Try :defineVariantFor macro looking for 300 tokens (multiple of 30),
echo using a line that contains 305 tokens
echo(
call :defineVariantFor forMacro A 300 ","
call :testVariantForMacro ln305 300
echo(
echo -------------------------------------------------------

echo(
echo Try :defineVariantFor macro looking for 300 tokens (multiple of 30),
echo using a line that only contains 165 tokens
echo(
call :testVariantForMacro ln165 300
echo(
echo -------------------------------------------------------

echo(
echo Try :defineVariantFor macro looking for 299 tokens (not a multiple of 30),
echo using a line that contains 305 tokens
echo(
call :defineVariantFor forMacro A 299 ","
call :testVariantForMacro ln305 299
echo(
echo -------------------------------------------------------

echo(
echo Try :defineFor macro looking for the maximum allowed 2303 tokens,
echo using a line that contains 2306 tokens.
echo(
set "bigLine="
for /l %%N in (1 1 2302) do set "bigLine=!bigLine!xx,"
set "bigLine=!bigLine!{2303},{2304},{2305},{2306}"
call :defineFor bigFor A 2303 ","
for /f "delims=" %%A in ("!bigLine!") do %bigFor% (
  echo token 2303 = %%%$2303%
  echo token 2304 = %%%$2304%
  echo MaxTokens  = %$max%
)
echo(
echo -------------------------------------------------------

echo(
echo Try :defineVariantFor macro looking for the maximum allowed 2226 tokens,
echo using a line that contains 2229 tokens.
echo(
set "bigLine="
for /l %%N in (1 1 2225) do set "bigLine=!bigLine!xx,"
set "bigLine=!bigLine!{2226},{2227},{2228},{2229}"
call :defineVariantFor bigVariantFor A 2226 ","
for /f "delims=" %%A in ("!bigLine!") do %bigVariantFor% (
  echo token 2226 = %%%$v2226%
  echo token 2227 = %%%$v2227%
  echo vMaxTokens = %$vmax%
)

exit /b

:testForMacro
(for /f "delims=" %%A in ("!%1!") do %forMacro% (
  echo(   %%%$1%   %%%$2%   %%%$3%   %%%$4%   %%%$5%   %%%$6%   %%%$7%   %%%$8%   %%%$9%  %%%$10%
  echo(  %%%$11%  %%%$12%  %%%$13%  %%%$14%  %%%$15%  %%%$16%  %%%$17%  %%%$18%  %%%$19%  %%%$20%
  echo(  %%%$21%  %%%$22%  %%%$23%  %%%$24%  %%%$25%  %%%$26%  %%%$27%  %%%$28%  %%%$29%  %%%$30%
  echo(  %%%$31%  %%%$32%  %%%$33%  %%%$34%  %%%$35%  %%%$36%  %%%$37%  %%%$38%  %%%$39%  %%%$40%
  echo(  %%%$41%  %%%$42%  %%%$43%  %%%$44%  %%%$45%  %%%$46%  %%%$47%  %%%$48%  %%%$49%  %%%$50%
  echo(  %%%$51%  %%%$52%  %%%$53%  %%%$54%  %%%$55%  %%%$56%  %%%$57%  %%%$58%  %%%$59%  %%%$60%
  echo(  %%%$61%  %%%$62%  %%%$63%  %%%$64%  %%%$65%  %%%$66%  %%%$67%  %%%$68%  %%%$69%  %%%$70%
  echo(  %%%$71%  %%%$72%  %%%$73%  %%%$74%  %%%$75%  %%%$76%  %%%$77%  %%%$78%  %%%$79%  %%%$80%
  echo(  %%%$81%  %%%$82%  %%%$83%  %%%$84%  %%%$85%  %%%$86%  %%%$87%  %%%$88%  %%%$89%  %%%$90%
  echo(  %%%$91%  %%%$92%  %%%$93%  %%%$94%  %%%$95%  %%%$96%  %%%$97%  %%%$98%  %%%$99% %%%$100%

  echo( %%%$101% %%%$102% %%%$103% %%%$104% %%%$105% %%%$106% %%%$107% %%%$108% %%%$109% %%%$120%
  echo( %%%$111% %%%$112% %%%$113% %%%$114% %%%$115% %%%$116% %%%$117% %%%$118% %%%$119% %%%$120%
  echo( %%%$121% %%%$122% %%%$123% %%%$124% %%%$125% %%%$126% %%%$127% %%%$128% %%%$129% %%%$130%
  echo( %%%$131% %%%$132% %%%$133% %%%$134% %%%$135% %%%$136% %%%$137% %%%$138% %%%$139% %%%$140%
  echo( %%%$141% %%%$142% %%%$143% %%%$144% %%%$145% %%%$146% %%%$147% %%%$148% %%%$149% %%%$150%
  echo( %%%$151% %%%$152% %%%$153% %%%$154% %%%$155% %%%$156% %%%$157% %%%$158% %%%$159% %%%$160%
  echo( %%%$161% %%%$162% %%%$163% %%%$164% %%%$165% %%%$166% %%%$167% %%%$168% %%%$169% %%%$170%
  echo( %%%$171% %%%$172% %%%$173% %%%$174% %%%$175% %%%$176% %%%$177% %%%$178% %%%$179% %%%$180%
  echo( %%%$181% %%%$182% %%%$183% %%%$184% %%%$185% %%%$186% %%%$187% %%%$188% %%%$189% %%%$190%
  echo( %%%$191% %%%$192% %%%$193% %%%$194% %%%$195% %%%$196% %%%$197% %%%$198% %%%$199% %%%$200%

  echo( %%%$201% %%%$202% %%%$203% %%%$204% %%%$205% %%%$206% %%%$207% %%%$208% %%%$209% %%%$220%
  echo( %%%$211% %%%$212% %%%$213% %%%$214% %%%$215% %%%$216% %%%$217% %%%$218% %%%$219% %%%$220%
  echo( %%%$221% %%%$222% %%%$223% %%%$224% %%%$225% %%%$226% %%%$227% %%%$228% %%%$229% %%%$230%
  echo( %%%$231% %%%$232% %%%$233% %%%$234% %%%$235% %%%$236% %%%$237% %%%$238% %%%$239% %%%$240%
  echo( %%%$241% %%%$242% %%%$243% %%%$244% %%%$245% %%%$246% %%%$247% %%%$248% %%%$249% %%%$250%
  echo( %%%$251% %%%$252% %%%$253% %%%$254% %%%$255% %%%$256% %%%$257% %%%$258% %%%$259% %%%$260%
  echo( %%%$261% %%%$262% %%%$263% %%%$264% %%%$265% %%%$266% %%%$267% %%%$268% %%%$269% %%%$270%
  echo( %%%$271% %%%$272% %%%$273% %%%$274% %%%$275% %%%$276% %%%$277% %%%$278% %%%$279% %%%$280%
  echo( %%%$281% %%%$282% %%%$283% %%%$284% %%%$285% %%%$286% %%%$287% %%%$288% %%%$289% %%%$290%
  echo( %%%$291% %%%$292% %%%$293% %%%$294% %%%$295% %%%$296% %%%$297% %%%$298% %%%$299% %%%$300%
  echo( $301 = %%%$301%
)) || echo NO DATA FOUND
exit /b

:testVariantForMacro
for /f "delims=" %%A in ("!%1!") do %forMacro% (
  echo(   %%%$v1%   %%%$v2%   %%%$v3%   %%%$v4%   %%%$v5%   %%%$v6%   %%%$v7%   %%%$v8%   %%%$v9%  %%%$v10%
  echo(  %%%$v11%  %%%$v12%  %%%$v13%  %%%$v14%  %%%$v15%  %%%$v16%  %%%$v17%  %%%$v18%  %%%$v19%  %%%$v20%
  echo(  %%%$v21%  %%%$v22%  %%%$v23%  %%%$v24%  %%%$v25%  %%%$v26%  %%%$v27%  %%%$v28%  %%%$v29%  %%%$v30%
  echo(  %%%$v31%  %%%$v32%  %%%$v33%  %%%$v34%  %%%$v35%  %%%$v36%  %%%$v37%  %%%$v38%  %%%$v39%  %%%$v40%
  echo(  %%%$v41%  %%%$v42%  %%%$v43%  %%%$v44%  %%%$v45%  %%%$v46%  %%%$v47%  %%%$v48%  %%%$v49%  %%%$v50%
  echo(  %%%$v51%  %%%$v52%  %%%$v53%  %%%$v54%  %%%$v55%  %%%$v56%  %%%$v57%  %%%$v58%  %%%$v59%  %%%$v60%
  echo(  %%%$v61%  %%%$v62%  %%%$v63%  %%%$v64%  %%%$v65%  %%%$v66%  %%%$v67%  %%%$v68%  %%%$v69%  %%%$v70%
  echo(  %%%$v71%  %%%$v72%  %%%$v73%  %%%$v74%  %%%$v75%  %%%$v76%  %%%$v77%  %%%$v78%  %%%$v79%  %%%$v80%
  echo(  %%%$v81%  %%%$v82%  %%%$v83%  %%%$v84%  %%%$v85%  %%%$v86%  %%%$v87%  %%%$v88%  %%%$v89%  %%%$v90%
  echo(  %%%$v91%  %%%$v92%  %%%$v93%  %%%$v94%  %%%$v95%  %%%$v96%  %%%$v97%  %%%$v98%  %%%$v99% %%%$v100%

  echo( %%%$v101% %%%$v102% %%%$v103% %%%$v104% %%%$v105% %%%$v106% %%%$v107% %%%$v108% %%%$v109% %%%$v120%
  echo( %%%$v111% %%%$v112% %%%$v113% %%%$v114% %%%$v115% %%%$v116% %%%$v117% %%%$v118% %%%$v119% %%%$v120%
  echo( %%%$v121% %%%$v122% %%%$v123% %%%$v124% %%%$v125% %%%$v126% %%%$v127% %%%$v128% %%%$v129% %%%$v130%
  echo( %%%$v131% %%%$v132% %%%$v133% %%%$v134% %%%$v135% %%%$v136% %%%$v137% %%%$v138% %%%$v139% %%%$v140%
  echo( %%%$v141% %%%$v142% %%%$v143% %%%$v144% %%%$v145% %%%$v146% %%%$v147% %%%$v148% %%%$v149% %%%$v150%
  echo( %%%$v151% %%%$v152% %%%$v153% %%%$v154% %%%$v155% %%%$v156% %%%$v157% %%%$v158% %%%$v159% %%%$v160%
  echo( %%%$v161% %%%$v162% %%%$v163% %%%$v164% %%%$v165% %%%$v166% %%%$v167% %%%$v168% %%%$v169% %%%$v170%
  echo( %%%$v171% %%%$v172% %%%$v173% %%%$v174% %%%$v175% %%%$v176% %%%$v177% %%%$v178% %%%$v179% %%%$v180%
  echo( %%%$v181% %%%$v182% %%%$v183% %%%$v184% %%%$v185% %%%$v186% %%%$v187% %%%$v188% %%%$v189% %%%$v190%
  echo( %%%$v191% %%%$v192% %%%$v193% %%%$v194% %%%$v195% %%%$v196% %%%$v197% %%%$v198% %%%$v199% %%%$v200%

  echo( %%%$v201% %%%$v202% %%%$v203% %%%$v204% %%%$v205% %%%$v206% %%%$v207% %%%$v208% %%%$v209% %%%$v220%
  echo( %%%$v211% %%%$v212% %%%$v213% %%%$v214% %%%$v215% %%%$v216% %%%$v217% %%%$v218% %%%$v219% %%%$v220%
  echo( %%%$v221% %%%$v222% %%%$v223% %%%$v224% %%%$v225% %%%$v226% %%%$v227% %%%$v228% %%%$v229% %%%$v230%
  echo( %%%$v231% %%%$v232% %%%$v233% %%%$v234% %%%$v235% %%%$v236% %%%$v237% %%%$v238% %%%$v239% %%%$v240%
  echo( %%%$v241% %%%$v242% %%%$v243% %%%$v244% %%%$v245% %%%$v246% %%%$v247% %%%$v248% %%%$v249% %%%$v250%
  echo( %%%$v251% %%%$v252% %%%$v253% %%%$v254% %%%$v255% %%%$v256% %%%$v257% %%%$v258% %%%$v259% %%%$v260%
  echo( %%%$v261% %%%$v262% %%%$v263% %%%$v264% %%%$v265% %%%$v266% %%%$v267% %%%$v268% %%%$v269% %%%$v270%
  echo( %%%$v271% %%%$v272% %%%$v273% %%%$v274% %%%$v275% %%%$v276% %%%$v277% %%%$v278% %%%$v279% %%%$v280%
  echo( %%%$v281% %%%$v282% %%%$v283% %%%$v284% %%%$v285% %%%$v286% %%%$v287% %%%$v288% %%%$v289% %%%$v290%
  echo( %%%$v291% %%%$v292% %%%$v293% %%%$v294% %%%$v295% %%%$v296% %%%$v297% %%%$v298% %%%$v299% %%%$v300%

  if %2==300 echo $vExtra301 = %%%$vExtra301%
  
)
exit /b


:defineFor  ForMacroName  InputVar  TokenCount  [DelimChars]
::
:: Defines a macro to be used for parsing an arbitrary number of tokens from
:: a FOR variable string. The macro always parses one additional token to hold
:: any remainder of the line that lies beyond the TokenCount tokens. The input
:: line should have at least as many tokens as TokenCount, else you run the
:: risk of the macro not returning anything (no tokens parsed).
::
::    ForMacroName = The name of the macro variable to be created.
::
::    InputVar = The name of the FOR variable that contains the string of tokens.
::
::    TokenCount = The number of tokens to parse.
::                 The maximum value is 2303 (256*9-1)
::
::    DelimChars = An optional string of one or more characters, each of which
::                 is treated as a token delimiter. Default is "<tab><space>".
::                 If <space> is included in the string, then it must be the
::                 last character in the string.
::
:: Tokens are accessed by $n variables.
:: For example, %%%$45% would represent the 45th token.
::
:: FOR /F modifiers may be freely used. For example, %%~nx%$10% would treat the
:: 10th token as a file path, and would expand to the file name and extension.
::
:: Normally, a single FOR /F is limited to 31 tokens, but the macro supports
:: many more, theoretically as many as 2303. However, each line to be parsed
:: must be less than 8191 characters in length.
::
:: This function may be called with delayed expansion enabled or disabled.
:: It is generally recommended that the macro be used with delayed expansion
:: disabled so that tokens containing ! are not corrupted.
::
:: This function automatically calls :defineForChars to define enough $n
:: variables to satisfy the TokenCount+1 tokens.
::
:: Example usage - Suppose you want to parse a well behaved CSV file named
:: test.csv that contains 300 columns. All lines must have the same number of
:: columns, and no column value may contain a comma.
::
:: The following code will correctly parse each data line of test.csv:
::
::    @echo off
::    setlocal disableDelayedExpansion
::    call :defineFor For300InA A 300 ","
::    for /f "skip=1 delims=" %%A in (test.csv) do %For300InA% (
::      echo token   1 = %%%$1%
::      echo token   2 = %%%$2%
::      echo ...
::      echo token 300 = %%%$300%
::    )
::
:: If the first token might begin with any character, including the default
:: EOL character, then the FOR /F line should be changed as follows:
::
::    for /f skip^=1^ delims^=^ eol^= %%A in (test.csv) do %For300InA% (
::   
if %$max%0 gtr %~30 goto :defineForInternal
set /a "$max=(%~3+256)/256"
call :defineForChars %$max%
:defineForInternal
setlocal enableDelayedExpansion
set "delims=%~4"
if not defined delims set "delims= "
set "in=%~2"
set "macro="
set /a max=31, end=0
for /l %%N in (1 31 %~3) do (
  if %%N neq 1 set "in=!$%%N!"
  set /a end+=31
  if !end! gtr %~3 set /a "max=%~3-%%N+1"
  set "macro=!macro! for /f "eol=!delims:~0,1! tokens=1-!max!* delims=!delims!" %%!$%%N! in ("%%!in!") do"
)
for /f "delims=" %%A in ("!macro! ") do endlocal & set "%~1=%%A"
exit /b


:defineForChars  Count
::
:: Defines variables to be used as FOR /F tokens, from $1 to $n,
:: where n = Count*256
:: Also defines $max = Count*256.
:: No other variables are defined or tampered with.
::
:: The maximum allowed Count is 9, meaning the max $max is 2304.
::
:: Once defined, the variables are very useful for parsing lines with a fixed
:: number of tokens > 31, as the values are guaranteed to be contiguous within
:: the FOR /F mapping scheme.
::
:: For example, you can use $1 as a FOR variable by using %%%$1%.
::
::   FOR /F "TOKENS=1-31" %%%$1% IN (....) DO ...
::
::      %%%$1% = token 1, %%%$2% = token 2, ... %%%$31% = token 31
::
:: This routine never uses SETLOCAL, and works regardless whether delayed expansion
:: is enabled or disabled.
::
:: Three temporary files are created and deleted in the %TEMP% folder, and the active
:: code page is temporarily set to 65001, and then restored to the starting value
:: before returning. Once defined, the $n variables can be used with any code page.
::
for /f "tokens=2 delims=:." %%P in ('chcp') do call :DefineForCharsInternal %1
exit /b
:defineForCharsInternal
set /a $max=%1*256
>"%temp%\forVariables.%~1.hex.txt" (
  echo FF FE
  for %%H in (
    "0 1 2 3 4 5 6 7 8 9 A B C D E F"
  ) do for /l %%N in (1 1 %~1) do for %%A in (%%~H) do for %%B in (%%~H) do (
    echo %%A%%B 0%%N 0D 00 0A 00
  )
)
>nul certutil.exe -decodehex -f "%temp%\forVariables.%~1.hex.txt" "%temp%\forVariables.%~1.utf-16le.bom.txt"
>nul chcp 65001
>"%temp%\forVariables.%~1.utf8.txt" type "%temp%\forVariables.%~1.utf-16le.bom.txt"
<"%temp%\forVariables.%~1.utf8.txt" (for /l %%N in (1 1 %$max%) do set /p "$%%N=")
for %%. in (dummy) do >nul chcp %%P  
del "%temp%\forVariables.%~1.*.txt"
exit /b


:defineVariantFor  ForMacroName  InputVar  TokenCount  [DelimChars  [DummyChar]]
::
:: Nearly the same as :DefineFor, except the resultant macro is designed to work
:: with input lines that may have fewer than TokenCount tokens.
::
::    ForMacroName = The name of the macro variable to be created.
::
::    InputVar = The name of the FOR variable that contains the string of tokens.
::
::    TokenCount = The maximum number of tokens to parse.
::                 The maximum value is 2226.
::
::    DelimChars = An optional string of one or more characters, each of which
::                 is treated as a token delimiter. Default is "<tab><space>".
::                 If <space> is included in the string, then it must be the
::                 last character in the string.
::
::    DummyChar  = An optional string to use as a prepended dummy token for each
::                 FOR /F iteration. The DummyChar is needed to support the
::                 requested number of tokens, yet still have the macro return
::                 results if the input line has fewer than TokenCount tokens.
::                 The DummyChar must not match any character within Delims.
::                 The default is "x". This optional parameter is only needed if
::                 Delims includes "x".
::
:: Tokens are accessed by $vn variables.
:: For example, %%%$v45% would represent the 45th token.
::
:: Any remaining unparsed string is contained in one of two possible tokens:
::   - If TokenCount is divisible by 30,
::     then the remainder is in $vExtra(TokenCount+1)
::   - Else the remainder is in $v(TokenCount+1)
::
:: So if TokenCount=300 (a multiple of 30), then the remainder is in $vExtra301.
:: But if TokenCount=299 (not a multiple of 30), then the remainder is in $v300.
::
:: This function automatically calls :defineVariantForChars to define enough $vn
:: and $vExtran variables to satisfy the TokenCount+1 requirement.
::
:: All other behaviors are the same as for :defineFor
::
if %$vmax%0 gtr %~30 goto :defineVariantForInternal
set /a "$vmax=(%~3+(%~3+30)/30+257)/256"
call :defineVariantForChars %$vmax%
:defineVariantForInternal
setlocal enableDelayedExpansion
set "dummy=%~5"
if not defined dummy set "dummy=x"
set "delims=%~4"
if not defined delims set "delims= "
set "dummy=!dummy!!delims:~0,1!"
set "in=%~2"
set "macro="
set /a max=31, end=0
for /l %%N in (1 30 %~3) do (
  if %%N neq 1 set "in=!$vExtra%%N!"
  set /a end+=30
  if !end! gtr %~3 set /a "max=%~3-%%N+2"
  set "macro=!macro! for /f "eol=!delims:~0,1! tokens=1-!max!* delims=!delims!" %%!$vExtra%%N! in ("!dummy!%%!in!") do"
)
for /f "delims=" %%A in ("!macro! ") do endlocal & set "%~1=%%A"
exit /b


:defineVariantForChars  Count
::
:: Nearly the same as :defineForChars, except this function defines $vn, $vmax,
:: and $vExtran variables instead of $n and $max variables.
::
:: These variables are useful for parsing lines with many tokens when the number
:: of tokens may vary on each line.
::
:: For every 30 $vn variables, there is one $vExtran variable defined.
::
:: The maximum Count allowed is 9, which results in a max $vmax of 227, which
:: corresponds to the largest $vn of $v2227.
::
:: All other behaviors are the same as for :defineForChars.
::
for /f "tokens=2 delims=:." %%P in ('chcp') do call :DefineVariantForCharsInternal %1
exit /b
:defineVariantForCharsInternal
set /a "$vmax=%1*256, $vmax-=($vmax+30)/30"
>"%temp%\forVariables.%~1.hex.txt" (
  echo FF FE
  for %%H in (
    "0 1 2 3 4 5 6 7 8 9 A B C D E F"
  ) do for /l %%N in (1 1 %~1) do for %%A in (%%~H) do for %%B in (%%~H) do (
    echo %%A%%B 0%%N 0D 00 0A 00
  )
)
>nul certutil.exe -decodehex -f "%temp%\forVariables.%~1.hex.txt" "%temp%\forVariables.%~1.utf-16le.bom.txt"
>nul chcp 65001
>"%temp%\forVariables.%~1.utf8.txt" type "%temp%\forVariables.%~1.utf-16le.bom.txt"
<"%temp%\forVariables.%~1.utf8.txt" (for /l %%N in (1 1 %$vmax%) do (
	2>nul set /a "1/((%%N-1)%% 30)" || set /p "$vExtra%%N="
  set /p "$v%%N="
))
for %%. in (dummy) do >nul chcp %%P  
del "%temp%\forVariables.%~1.*.txt"
exit /b
--OUTPUT--

Code: Select all


Try :defineFor macro looking for 300 tokens,
using a line that contains 305 tokens

   {1}   {2}   {3}   {4}   {5}   {6}   {7}   {8}   {9}  {10}
  {11}  {12}  {13}  {14}  {15}  {16}  {17}  {18}  {19}  {20}
  {21}  {22}  {23}  {24}  {25}  {26}  {27}  {28}  {29}  {30}
  {31}  {32}  {33}  {34}  {35}  {36}  {37}  {38}  {39}  {40}
  {41}  {42}  {43}  {44}  {45}  {46}  {47}  {48}  {49}  {50}
  {51}  {52}  {53}  {54}  {55}  {56}  {57}  {58}  {59}  {60}
  {61}  {62}  {63}  {64}  {65}  {66}  {67}  {68}  {69}  {70}
  {71}  {72}  {73}  {74}  {75}  {76}  {77}  {78}  {79}  {80}
  {81}  {82}  {83}  {84}  {85}  {86}  {87}  {88}  {89}  {90}
  {91}  {92}  {93}  {94}  {95}  {96}  {97}  {98}  {99} {100}
 {101} {102} {103} {104} {105} {106} {107} {108} {109} {120}
 {111} {112} {113} {114} {115} {116} {117} {118} {119} {120}
 {121} {122} {123} {124} {125} {126} {127} {128} {129} {130}
 {131} {132} {133} {134} {135} {136} {137} {138} {139} {140}
 {141} {142} {143} {144} {145} {146} {147} {148} {149} {150}
 {151} {152} {153} {154} {155} {156} {157} {158} {159} {160}
 {161} {162} {163} {164} {165} {166} {167} {168} {169} {170}
 {171} {172} {173} {174} {175} {176} {177} {178} {179} {180}
 {181} {182} {183} {184} {185} {186} {187} {188} {189} {190}
 {191} {192} {193} {194} {195} {196} {197} {198} {199} {200}
 {201} {202} {203} {204} {205} {206} {207} {208} {209} {220}
 {211} {212} {213} {214} {215} {216} {217} {218} {219} {220}
 {221} {222} {223} {224} {225} {226} {227} {228} {229} {230}
 {231} {232} {233} {234} {235} {236} {237} {238} {239} {240}
 {241} {242} {243} {244} {245} {246} {247} {248} {249} {250}
 {251} {252} {253} {254} {255} {256} {257} {258} {259} {260}
 {261} {262} {263} {264} {265} {266} {267} {268} {269} {270}
 {271} {272} {273} {274} {275} {276} {277} {278} {279} {280}
 {281} {282} {283} {284} {285} {286} {287} {288} {289} {290}
 {291} {292} {293} {294} {295} {296} {297} {298} {299} {300}
 $301 = {301},{302},{303},{304},{305},

-------------------------------------------------------

Try :defineFor macro looking for 300 tokens,
using a line that only contains 165 tokens

NO DATA FOUND

-------------------------------------------------------

Try :defineVariantFor macro looking for 300 tokens (multiple of 30),
using a line that contains 305 tokens

   {1}   {2}   {3}   {4}   {5}   {6}   {7}   {8}   {9}  {10}
  {11}  {12}  {13}  {14}  {15}  {16}  {17}  {18}  {19}  {20}
  {21}  {22}  {23}  {24}  {25}  {26}  {27}  {28}  {29}  {30}
  {31}  {32}  {33}  {34}  {35}  {36}  {37}  {38}  {39}  {40}
  {41}  {42}  {43}  {44}  {45}  {46}  {47}  {48}  {49}  {50}
  {51}  {52}  {53}  {54}  {55}  {56}  {57}  {58}  {59}  {60}
  {61}  {62}  {63}  {64}  {65}  {66}  {67}  {68}  {69}  {70}
  {71}  {72}  {73}  {74}  {75}  {76}  {77}  {78}  {79}  {80}
  {81}  {82}  {83}  {84}  {85}  {86}  {87}  {88}  {89}  {90}
  {91}  {92}  {93}  {94}  {95}  {96}  {97}  {98}  {99} {100}
 {101} {102} {103} {104} {105} {106} {107} {108} {109} {120}
 {111} {112} {113} {114} {115} {116} {117} {118} {119} {120}
 {121} {122} {123} {124} {125} {126} {127} {128} {129} {130}
 {131} {132} {133} {134} {135} {136} {137} {138} {139} {140}
 {141} {142} {143} {144} {145} {146} {147} {148} {149} {150}
 {151} {152} {153} {154} {155} {156} {157} {158} {159} {160}
 {161} {162} {163} {164} {165} {166} {167} {168} {169} {170}
 {171} {172} {173} {174} {175} {176} {177} {178} {179} {180}
 {181} {182} {183} {184} {185} {186} {187} {188} {189} {190}
 {191} {192} {193} {194} {195} {196} {197} {198} {199} {200}
 {201} {202} {203} {204} {205} {206} {207} {208} {209} {220}
 {211} {212} {213} {214} {215} {216} {217} {218} {219} {220}
 {221} {222} {223} {224} {225} {226} {227} {228} {229} {230}
 {231} {232} {233} {234} {235} {236} {237} {238} {239} {240}
 {241} {242} {243} {244} {245} {246} {247} {248} {249} {250}
 {251} {252} {253} {254} {255} {256} {257} {258} {259} {260}
 {261} {262} {263} {264} {265} {266} {267} {268} {269} {270}
 {271} {272} {273} {274} {275} {276} {277} {278} {279} {280}
 {281} {282} {283} {284} {285} {286} {287} {288} {289} {290}
 {291} {292} {293} {294} {295} {296} {297} {298} {299} {300}
$vExtra301 = {301},{302},{303},{304},{305},

-------------------------------------------------------

Try :defineVariantFor macro looking for 300 tokens (multiple of 30),
using a line that only contains 165 tokens

   {1}   {2}   {3}   {4}   {5}   {6}   {7}   {8}   {9}  {10}
  {11}  {12}  {13}  {14}  {15}  {16}  {17}  {18}  {19}  {20}
  {21}  {22}  {23}  {24}  {25}  {26}  {27}  {28}  {29}  {30}
  {31}  {32}  {33}  {34}  {35}  {36}  {37}  {38}  {39}  {40}
  {41}  {42}  {43}  {44}  {45}  {46}  {47}  {48}  {49}  {50}
  {51}  {52}  {53}  {54}  {55}  {56}  {57}  {58}  {59}  {60}
  {61}  {62}  {63}  {64}  {65}  {66}  {67}  {68}  {69}  {70}
  {71}  {72}  {73}  {74}  {75}  {76}  {77}  {78}  {79}  {80}
  {81}  {82}  {83}  {84}  {85}  {86}  {87}  {88}  {89}  {90}
  {91}  {92}  {93}  {94}  {95}  {96}  {97}  {98}  {99} {100}
 {101} {102} {103} {104} {105} {106} {107} {108} {109} {120}
 {111} {112} {113} {114} {115} {116} {117} {118} {119} {120}
 {121} {122} {123} {124} {125} {126} {127} {128} {129} {130}
 {131} {132} {133} {134} {135} {136} {137} {138} {139} {140}
 {141} {142} {143} {144} {145} {146} {147} {148} {149} {150}
 {151} {152} {153} {154} {155} {156} {157} {158} {159} {160}
 {161} {162} {163} {164} {165}













$vExtra301 =

-------------------------------------------------------

Try :defineVariantFor macro looking for 299 tokens (not a multiple of 30),
using a line that contains 305 tokens

   {1}   {2}   {3}   {4}   {5}   {6}   {7}   {8}   {9}  {10}
  {11}  {12}  {13}  {14}  {15}  {16}  {17}  {18}  {19}  {20}
  {21}  {22}  {23}  {24}  {25}  {26}  {27}  {28}  {29}  {30}
  {31}  {32}  {33}  {34}  {35}  {36}  {37}  {38}  {39}  {40}
  {41}  {42}  {43}  {44}  {45}  {46}  {47}  {48}  {49}  {50}
  {51}  {52}  {53}  {54}  {55}  {56}  {57}  {58}  {59}  {60}
  {61}  {62}  {63}  {64}  {65}  {66}  {67}  {68}  {69}  {70}
  {71}  {72}  {73}  {74}  {75}  {76}  {77}  {78}  {79}  {80}
  {81}  {82}  {83}  {84}  {85}  {86}  {87}  {88}  {89}  {90}
  {91}  {92}  {93}  {94}  {95}  {96}  {97}  {98}  {99} {100}
 {101} {102} {103} {104} {105} {106} {107} {108} {109} {120}
 {111} {112} {113} {114} {115} {116} {117} {118} {119} {120}
 {121} {122} {123} {124} {125} {126} {127} {128} {129} {130}
 {131} {132} {133} {134} {135} {136} {137} {138} {139} {140}
 {141} {142} {143} {144} {145} {146} {147} {148} {149} {150}
 {151} {152} {153} {154} {155} {156} {157} {158} {159} {160}
 {161} {162} {163} {164} {165} {166} {167} {168} {169} {170}
 {171} {172} {173} {174} {175} {176} {177} {178} {179} {180}
 {181} {182} {183} {184} {185} {186} {187} {188} {189} {190}
 {191} {192} {193} {194} {195} {196} {197} {198} {199} {200}
 {201} {202} {203} {204} {205} {206} {207} {208} {209} {220}
 {211} {212} {213} {214} {215} {216} {217} {218} {219} {220}
 {221} {222} {223} {224} {225} {226} {227} {228} {229} {230}
 {231} {232} {233} {234} {235} {236} {237} {238} {239} {240}
 {241} {242} {243} {244} {245} {246} {247} {248} {249} {250}
 {251} {252} {253} {254} {255} {256} {257} {258} {259} {260}
 {261} {262} {263} {264} {265} {266} {267} {268} {269} {270}
 {271} {272} {273} {274} {275} {276} {277} {278} {279} {280}
 {281} {282} {283} {284} {285} {286} {287} {288} {289} {290}
 {291} {292} {293} {294} {295} {296} {297} {298} {299} {300},{301},{302},{303},{304},{305},

-------------------------------------------------------

Try :defineFor macro looking for the maximum allowed 2303 tokens,
using a line that contains 2306 tokens.

token 2303 = {2303}
token 2304 = {2304},{2305},{2306}
MaxTokens  = 2304

-------------------------------------------------------

Try :defineVariantFor macro looking for the maximum allowed 2226 tokens,
using a line that contains 2229 tokens.

token 2226 = {2226}
token 2227 = {2227},{2228},{2229}
vMaxTokens = 2227
Dave Benham

Thor
Posts: 43
Joined: 31 Mar 2016 15:02

Re: Using many "tokens=..." in FOR /F command in a simple way

#48 Post by Thor » 20 Mar 2017 00:59

I'm almost sure your input line has exceeded the 8191 byte limit.

I have successfully tested up to my macro limit of 2303 (I had mistakenly reported 2304, but that last token is reserved for the remainder of the line that is not parsed). But in order to fit that many tokens within a 8191 line length, I had to limit the width of almost all the tokens to 2 characters (+ 1 delimiter for each token).

2303*3 = 6969.

If you bump the average width to 4 (3 width value + 1 delimiter), then you are up to 9212, and you have blown the limit.

You're definitely right. I've exceeded my 8191 limit. Using your technique I could display the last 100 tokens from 2200-2303 without any problem.

penpen
Expert
Posts: 2009
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Using many "tokens=..." in FOR /F command in a simple way

#49 Post by penpen » 20 Mar 2017 08:41

dbenham wrote:But 0x0B and 0x0C fail for you :?: :shock:

Can you please check them again? In my hands they fall into your category 2
Tanks for finding this error; you're right:

Code: Select all

@echo off
setlocal enableExtensions enableDelayedExpansion
for /f "tokens=2 delims=:." %%a in ('chcp') do set "cp=%%~a"
>nul chcp 65001
>"check.utf8.bom.hex.txt" echo(EF BB BF 23 08 09 0A 0B 0C 23
>nul certutil.exe -decodehex -f "check.utf8.bom.hex.txt" "check.utf8.bom.txt"

<"check.utf8.bom.txt" (
   set "v="
   set /p "v="
   set "v=!v:~2,-1!"
)

for /f "tokens=1-17" %%%v:~0,1% in ("@08 @09 @0A @0B @0C") do (
   echo( %%~%v:~0,1% %%~%v:~1,1% %%~^%v:~2,1%%v:~2,1% %%~%v:~3,1% %%~%v:~4,1%
)

>nul chcp %cp%
endlocal
goto :eof

Last night, i've used the same test for multiple variables, which has no doubled character for the variable at position U+000A, so the rest of this line dropped... i didn't notice that error ... .
But i will correct the above list.


penpen

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Using many "tokens=..." in FOR /F command in a simple way

#50 Post by Aacini » 24 Mar 2017 21:54

:arrow: Maximum line length and number of tokens in FOR /F command

If you would asked me before: "What is the maximum line length that a FOR /F command can process?" I would answered: "8191 bytes". Why? Because I read such data in several places many times before. I never had completed a test about this point because I never had the need of reach the "tokens=..." limit. However, the development of the techniques described in this topic makes possible to have a very large number of "tokens=...", so it is important now to know with precision what is the maximum possible number of tokens that can be processed in a FOR /F command.

Taking a limit of 8191 characters and using one-character tokens separated by one-character delimiters, it would be possible to have a maximum of 4096 tokens. I wrote this code for my tests:

Code: Select all

@echo off
setlocal

set /A tokens=4096,  n=tokens-1

(
for /L %%i in (1,1,%n%) do set /P "=A "
echo Z
) < NUL > forA.txt
 
(for /F "tokens=1*" %%A in (forA.txt) do (
   set /P "=%%A "
   echo %%B
)) < NUL > forB.txt

This program first create the forA.txt file with the number of tokens given and then try to duplicate it into forB.txt one using the FOR /F "tokens=..." command. If the duplicated file is identical that the original, then it proves that the FOR /F command is processing the tokens from fileA.txt correctly.

This base code works as it should: the created forB.txt file is identical to forA.txt. If we increase the tokens to 4097 the forB.txt file is created with no CR+LF at end. If the number is further increased to 4098 the forB.txt file ends as an empty file.

However, if the second FOR command use three tokens to access the file, instead of two:

Code: Select all

(for /F "tokens=1-2*" %%A in (forA.txt) do (
   set /P "=%%A %%B "
   echo %%C
)) < NUL > forB.txt

... then the forB.txt file is created again with no CR+LF at end, and with four tokens:

Code: Select all

(for /F "tokens=1-3*" %%A in (forA.txt) do (
   set /P "=%%A %%B %%C "
   echo %%D
)) < NUL > forB.txt

... the forB.txt file is correctly created again with 4098 tokens. This means that the maximum number of tokens that FOR /F command can process is NOT limited to lines of 8191 bytes, but depends on the number of remaining tokens that are placed in the "*" last token.

If we use "tokens=1-31*" and the appropriate individual tokens, then the forB.txt file can be correctly created from a line of 8251 bytes that contains 4126 tokens, that is the maximum possible number.



However, the maximum line length that can be processed also depends on the length of the tokens. The code below create the fileA.txt with 31 tokens of 272 characters each plus one character for the separator, so the file have one line of 273 x 31 = 8463 bytes (one token more after 8191 bytes). The forB.txt file is created correctly:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

set "token="
for /L %%i in (1,1,272) do set "token=!token!A"

(
for /L %%i in (1,1,31) do set /P "=%token% "
echo/
) < NUL > forA.txt
 

(for /F "tokens=1-31" %%@ in (forA.txt) do (
   set /P "=%%@ %%A %%B %%C %%D %%E %%F %%G %%H %%I %%J %%K %%L %%M %%N "
   set /P "=%%O %%P %%Q %%R %%S %%T %%U %%V %%W %%X %%Y %%Z %%[ %%\ %%] "
   echo %%^^
)) < NUL > forB.txt

If the tendency of previous tests was understood, then the next test should be obvious. The code below create a file that contain 32 tokens of 8155 characters each, that is, 260992 bytes in a single line. The forB.txt file is created correctly.

Code: Select all

@echo off
setlocal EnableDelayedExpansion

set "token="
for /L %%i in (1,1,8155) do set "token=!token!A"

(
for /L %%i in (1,1,32) do set /P "=%token% "
echo/
) < NUL > forA.txt
 

(for /F "tokens=1-31*" %%@ in (forA.txt) do (
   set /P "=%%@ "
   set /P "=%%A "
   set /P "=%%B "
   set /P "=%%C "
   set /P "=%%D "
   set /P "=%%E "
   set /P "=%%F "
   set /P "=%%G "
   set /P "=%%H "
   set /P "=%%I "
   set /P "=%%J "
   set /P "=%%K "
   set /P "=%%L "
   set /P "=%%M "
   set /P "=%%N "
   set /P "=%%O "
   set /P "=%%P "
   set /P "=%%Q "
   set /P "=%%R "
   set /P "=%%S "
   set /P "=%%T "
   set /P "=%%U "
   set /P "=%%V "
   set /P "=%%W "
   set /P "=%%X "
   set /P "=%%Y "
   set /P "=%%Z "
   set /P "=%%[ "
   set /P "=%%\ "
   set /P "=%%] "
   set /P "=%%^ "
   echo %%_
)) < NUL > forB.txt

In conclusion, these are the

Operation rules of FOR /F command about "tokens=..." option:

  • The maximum number of tokens in a FOR /F command is 32, including the "rest of tokens" last one: "tokens=1-31*".
  • The maximum length of the tokens is limited by the maximum length of the command-line that use each token, that is 8191 bytes. Shorter commands allows to use larger tokens.
  • The maximum number of tokens in the lines of a text file is equal to 4126, when all tokens have just one character and the command that process the "rest of tokens" last token is not too large. If the tokens after the 31th one are larger, then its maximum number decrease accordingly, so the length of the "rest of tokens" last token must always fit in its 8191 bytes command-line.
  • The maximum length of the lines of a text file may be near to 261000 bytes, when the 32 possible tokens have they all a length near to 8191 bytes.

Antonio

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Using many "tokens=..." in FOR /F command in a simple way

#51 Post by Aacini » 24 Mar 2017 22:06

When I created this topic there was a problem: the characters in the 128..255 extended range does not follow the standard sequence when they are used as FOR /F tokens. Besides, the correct sequence was difficult to find and was different for each code page. Posterior and interesting replies explained the problem and shown possible methods to solve it, but doing that still required a lot of testing and work. Finally, the solution given by Dave was very simple and provided a way to use much more additional tokens than the originally planned ones. Using such a solution I could complete my application.

The purpose of my program is assemble a method that allows to process a text file via a FOR /F command using many tokens, as much as possible, but in a simple way. Note that this application is not just a technicall curiosity: perhaps a practical use would not be to process hundreds or thousands of tokens in each line of a very large file, but just process a few number of tokens taking them from a large file that may contain hundreds of values in each line, like a very large spreadsheet. This is the new version of my application:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

rem MakeForTokens.bat application written by Antonio Perez Ayala

if "%~1" neq "" if "%~1" neq "/?" goto begin

:usage
echo/
echo Create ForTokens.bat file as the skeleton of a program that supports a very
echo large number of tokens via a series of nested FOR /F commands.
echo/
echo    MakeForTokens.bat numTokens
echo/
echo The number of tokens must be between 32 and 4094.
echo/
echo After the ForTokens.bat file is created, you must rename and modify it to suit
echo your needs; the program include full descriptions of the required changes.
echo/
echo If you give 300 in the number of tokens, the generated ForTokens.bat program
echo may run immediately over an example data file that is created.
goto :EOF


:begin
set /A "numTokens=0, numTokens=%~1, numTokensP1=numTokens+1" 2>NUL
if %numTokens% lss 32 goto usage
if %numTokens% gtr 4094 goto usage

for /F "delims=:" %%a in ('findstr /N /B ":Header" "%~F0"') do set "header=%%a"
< "%~F0" (
   echo @echo off ^& setlocal EnableDelayedExpansion ^& set "$numTokens=%numTokensP1%"
   for /L %%i in (1,1,%header%) do set /P "="
   findstr "^"
) > ForTokens.bat
echo ForTokens.bat file created with support for %numTokens% tokens
goto :EOF


rem The following section define the contents of the ForTokens.bat file

:Header

Rem/For  This is a base program that process a file via FOR /F command with up to $numTokens tokens
Rem/For  This program was created using MakeForTokens.bat application written by Antonio Perez Ayala

Rem/For  Step 0:
Rem/For    In order to use this program, you should have a data file with many tokens to process.
Rem/For    A simple data file with 305 tokens is created here, so the examples below works correctly.
( for /L %%i in (1,1,305) do set /P "={%%i}," ) < NUL > dataFile.txt

Rem/For  Step 1:
Rem/For    Define the series of auxiliary variables that will be used as FOR tokens.
Rem/For    The subroutine use the value of $numTokens variable as input.
call :DefineForTokens

Rem/For  Step 2: (optional, but recommended)
Rem/For    Define an auxiliary variable that will contain the desired tokens when it is %expanded%.
Rem/For    This variable is created from a string similar to the original FOR /F "tokens=x,y,m-n" one,
Rem/For    but that allows larger token numbers, ranges in descending order and increment greater than 1,
Rem/For    and it also returns the number of created tokens. See full description later.
Rem/For    In the example below this variable is called "tokens" and it contains these tokens/elements:
Rem/For    10 28 29 30 31 32 170 167 164 161, and "tokens.len" variable is also created with 10.
Rem/For    You may or may not enclose the tokens definition between quotes.
call :ExpandTokensString "tokens=10,28-32,170-161-3"
echo Definition:  tokens=10,28-32,170-161-3  created %tokens.len% tokens

Rem/For  Step 3: (optional)
Rem/For    Define the variable with the "delims" value that will be used in the nested FOR's.
Rem/For    This variable must be named "delims" and it contains *the definition* of
Rem/For    the same part of the FOR command, including the "delims=" word itself.
Rem/For    If you want no delims (default: TAB+space), delete "delims" variable: set "delims="
Rem/For    If you want "delims=", define: set "delims=delims="
set "delims=delims=,"

Rem/For  Step 4:
Rem/For    Create the macro that contain the nested FOR's. This step must be performed after both
Rem/For    FOR tokens and "delims" variables was defined and before enter to the main FOR /F command.
Rem/For    The subroutine use the value of $numTokens variable as input.
call :CreateNestedFors

Rem/For  Step 5:
Rem/For    This is the main FOR /F command that process the file,
Rem/For    there is one additional nested FOR /F command for each 31 tokens (or part).
Rem/For    You must include the right filename in the next line:
for /F "usebackq tokens=1-31* %delims%" %%%$1% in ("dataFile.txt") do %NestedFors% (

   Rem/For  Step 6:
   Rem/For    Process the tokens. To just show they, use the "tokens" variable defined above:
   echo Tokens: %tokens%

   Rem/For    You may process "tokens" values via a plain FOR command:
   for %%a in (%tokens%) do echo Token via FOR: %%a

   Rem/For    ... or via another FOR /F command:
   for /F "tokens=1-%tokens.len%" %%a in ("%tokens%") do (
      echo Token #1 of FOR /F: %%a
      echo Token #6 of FOR /F: %%f
      echo Token #9 of FOR /F: %%i
   )

   Rem/For    You may also directly use the auxiliary "$#" tokens variables. See description below.
   echo Token #242 via its token variable: %%%$242%
   echo Full path of token #273: %%~F%$273%

   Rem/For    If there are additional tokens after the $numTokens number used to create this file,
   Rem/For    they will be grouped into the next token. For example, if this file was created via
   Rem/For    MakeForTokens.bat 300, then you may show the tokens beyond 300 this way:
   echo Additional tokens after the #300: %%%$301%

Rem/For  Closing parenthesis of the main FOR /F command
)

goto :EOF



Support subroutines. You must not modify any code below this line,
but all these explanations may be removed.



The next subroutine define the auxiliary variables that are used to access FOR /F tokens.

These variables are *called* $1, $2, etc., so %$43% is *the token* number 43, and %%%$43% is
*the value* of such a token when this construct is placed inside the FOR /F command.
The usual FOR modifiers may be used: %%~NX%$1284% expands to name and extension of token 1284.

The method to create these variables was originally written by DosTips.com user dbenham. See:
http://www.dostips.com/forum/viewtopic.php?f=3&t=7703&p=51595#p51595

This subroutine does not use SETLOCAL. It modify and delete these variables: _cp, _hex, _pages
and use the value of $numTokens variable as input.

:DefineForTokens

for /F "tokens=2 delims=:." %%p in ('chcp') do set /A "_cp=%%p, _pages=($numTokens/256+1)*2"
set "_hex= 0 1 2 3 4 5 6 7 8 9 A B C D E F"
call set "_pages=%%_hex:~0,%_pages%%%"
if %$numTokens% gtr 2048 echo Creating FOR tokens variables, please wait . . .
(
   echo FF FE
   for %%P in (%_pages%) do for %%A in (%_hex%) do for %%B in (%_hex%) do echo %%A%%B 3%%P 0D 00 0A 00
) > "%temp%\forTokens.hex.txt"
certutil.exe -decodehex -f "%temp%\forTokens.hex.txt" "%temp%\forTokens.utf-16le.bom.txt" >NUL
chcp 65001 >NUL
type "%temp%\forTokens.utf-16le.bom.txt" > "%temp%\forTokens.utf8.txt"
(for /L %%N in (0,1,%$numTokens%) do set /P "$%%N=")  < "%temp%\forTokens.utf8.txt"
chcp %_cp% >NUL
del "%temp%\forTokens.*.txt"
for %%v in (_cp _hex _pages) do set "%%v="
exit /B



The next subroutine create the series of nested FOR's that covers all required FOR tokens;
it use the value of $numTokens variable as input.

:CreateNestedFors

setlocal EnableDelayedExpansion
set /A "numTokens=$numTokens-1, mod=numTokens%%31, i=numTokens/31, lim=31"
if %mod% equ 0 set "mod=31"
set "NestedFors="
for /L %%i in (32,31,%numTokens%) do (
   if !i! equ 1 set "lim=!mod!"
   set "NestedFors=!NestedFors! for /F "tokens=1-!lim!* %delims%" %%!$%%i! in ("%%!$%%i!") do"
   set /A "i-=1"
)
for /F "delims=" %%a in ("!NestedFors!") do endlocal & set "NestedFors=%%a"
exit /B



The next subroutine expands a tokens definition string into a series of individual $# tokens variables.

The tokens definition string have the same form of the standard FOR /F "tokens=x,y,m-n" one.
Additionally, you may define a tokens range in descending order: "tokens=10-6" produce 10 9 8 7 6
or use an increment different than 1: "tokens=10-25+5,400-200-100" produce 10 15 20 25 400 300 200
When the subroutine ends, the total number of tokens created is stored in a global variable
with the same name of the tokens one, plus ".len" added at end.

:ExpandTokensString variable=tokens definitions ...

setlocal EnableDelayedExpansion
set "var=" & set "tokens=" & set "len=0"
if "%~2" equ "" (set "params=%~1") else set "params=%*"
for %%a in (!params!) do (
   if not defined var (
      set "var=%%a"
   ) else for /F "tokens=1-3 delims=-+" %%i in ("%%a") do (
      if "%%j" equ "" (
         if %%i lss %$numTokens% set "tokens=!tokens! %%!$%%i!" & set /A len+=1
      ) else (
         if "%%k" equ "" (set "k=1") else set "k=%%k"
         if %%i leq %%j (
            for /L %%n in (%%i,!k!,%%j) do if %%n lss %$numTokens% set "tokens=!tokens! %%!$%%n!" & set /A len+=1
         ) else (
            for /L %%n in (%%i,-!k!,%%j) do if %%n lss %$numTokens% set "tokens=!tokens! %%!$%%n!" & set /A len+=1
         )
      )
   )
)
endlocal & set "%var%=%tokens%" & set "%var%.len=%len%"
exit /B

The way to use this application is executing it with the desired number of tokens in the parameter, for example: MakeForTokens.bat 300. The application creates a Batch file named ForTokens.bat that contains the code, subroutines and values needed to access such amount of tokens, so the users just needs to insert their own details in the code in order to get a working program. The examples shown in the code are based on a data file with 300 tokens that is created at beginning, so if you create ForTokens.bat with 300 tokens, it can run immediately with no modification. This is the output of such a program:

Code: Select all

Definition:  tokens=10,28-32,170-161-3  created 10 tokens
Tokens:  {10} {28} {29} {30} {31} {32} {170} {167} {164} {161}
Token via FOR: {10}
Token via FOR: {28}
Token via FOR: {29}
Token via FOR: {30}
Token via FOR: {31}
Token via FOR: {32}
Token via FOR: {170}
Token via FOR: {167}
Token via FOR: {164}
Token via FOR: {161}
Token #1 of FOR /F: {10}
Token #6 of FOR /F: {32}
Token #9 of FOR /F: {164}
Token #242 via its token variable: {242}
Full path of token #273: C:\Users\Antonio\Documents\tests\{273}
Additional tokens after the #300: {301},{302},{303},{304},{305},

If you want that the generated ForTokens.bat program process a different number of tokens, you may modify the value in the first line; just remember that such a value must be one token more than the base number. The additional token is used to store the additional values that may appear in the file lines after the base number of tokens was processed.

The method implemented in this program allows to process up to 4094 tokens, that is very close to the maximum limit of 4126 tokens that can be managed in a FOR /F command (as described in previous post). I successfully tested this program over a text file with 4100 tokens (4090 tokens of one character plus the numbers from 4091 to 4100), that is processed with the 4094 maximum number of tokens; the resulting code have 133 nested FOR /F commands and all of them include a "delims=," string. This is the code used to create the 4100 tokens data file, and the output of the program:

Code: Select all

(
   for /L %%i in (1,1,4090) do set /P "=x,"
   for /L %%i in (4091,1,4100) do set /P "=%%i,"
   echo/
) < NUL > dataFile.txt


Output:

Definition:  tokens=1000,2000,3000,4000,4090-4094  created 9 tokens
Tokens:  x x x x x 4091 4092 4093 4094
Additional tokens after the #4094: 4095,4096,4097,4098,4099,4100,

However, this test is not complete because it does not confirm that all the tokens in the data file are processed correctly, so I want a test that allow me to identify any token I wish. This point is important to me because I want not that the same unpleasant surprise happen one more time (I developed the first version of this application based on reports from three people that stated that "all characters from 0x80 to 0xFF works correctly as FOR tokens, excepting 0xFF" :evil: ).

Unfortunately, using different tokens decrease its maximum number because unique tokens must be larger, so a compromise between token length and number of tokens must be made. I used the "Base 62" number system using "0123456789ABC...XYZabc...xyz" characters as digits in 0..61 range, that allows to write 62*62 = 3844 different values using two characters for each one. This range of values cover the 2771 possible tokens that can be processed when the tokens have two characters each. I wrote this example creating a ForTokens.bat file with the right number of tokens; then, I modified the code to create the data file with two-characters tokens and process the file converting each token from "Base 62" into the equivalent decimal value. Finally, I removed all comments from the file. This is the result:

Code: Select all

@echo off & setlocal EnableDelayedExpansion & set "$numTokens=2771"

rem Step 0: Create the data file with 2770 tokens comprised of two "Base 62" digits each:
echo Creating the data file, please wait . . .
set "base=0 1 2 3 4 5 6 7 8 9"
set "base=%base% A B C D E F G H I J K L M N O P Q R S T U V W X Y Z"
set "upCase=%base: =%"
set "i=0"
for %%a in (%base%) do set /A "base[%%a]=i, i+=1"
set "base=%base% a b c d e f g h i j k l m n o p q r s t u v w x y z"
set "tokens=2770"
(
   set /P "=%base:~2% "
   set /A tokens-=61
   for %%A in (%base:~2%) do for %%B in (%base%) do if !tokens! gtr 0 (
      set /P "=%%A%%B "
      set /A tokens-=1
   )
   echo/
) < NUL > dataFile.txt

rem Step 1: Define the series of auxiliary variables that will be used as FOR tokens:
call :DefineForTokens

rem Step 3: Define the variable with the "delims" value that will be used in the nested FOR's:
set "delims="

rem Step 4: Create the macro that contain the nested FOR's:
rem         This is done just once, before enter to the next loop
call :CreateNestedFors

rem Loop: Read tokens definitions from the user, expand they and process the resulting tokens

cls
echo/
echo Enter tokens definitions; the valid tokens numbers are in 1..2770 range.
echo/
echo You may insert a tokens range in ascending or descending order with an optional
echo increment/decrement different than 1. For example: tokens=10-6,87,100-500+100

:loop
echo/
set /P "tokens=tokens="
if errorlevel 1 goto :EOF

rem Step 2: Define the auxiliary variable that will contain the desired tokens:
call :ExpandTokensString "tokens=%tokens%"
echo %tokens.len% tokens defined

rem Step 5: This is the main FOR /F command that process the file:
for /F "usebackq tokens=1-31* %delims%" %%%$1% in ("dataFile.txt") do %NestedFors% (

   rem Step 6: Process the tokens:
   echo Original: %tokens%
   set "line="
   for %%a in (%tokens%) do (
      set "token=%%a"
      set "A=!token:~0,1!" & set "B=!token:~1,1!"
      if not defined B set "B=!A!" & set "A=0"
      set /A X=base[!A!], Y=base[!B!]
      for /F %%c in ("!A!") do if "!upCase:%%c=%%c!" neq "%upCase%" set /A X+=26
      for /F %%c in ("!B!") do if "!upCase:%%c=%%c!" neq "%upCase%" set /A Y+=26
      set /A num=X*62+Y
      set "line=!line! !num!
   )
   echo Decimal:  !line!

)
goto :loop



:DefineForTokens

for /F "tokens=2 delims=:." %%p in ('chcp') do set /A "_cp=%%p, _pages=($numTokens/256+1)*2"
set "_hex= 0 1 2 3 4 5 6 7 8 9 A B C D E F"
call set "_pages=%%_hex:~0,%_pages%%%"
if %$numTokens% gtr 2048 echo Creating FOR tokens variables, please wait . . .
(
   echo FF FE
   for %%P in (%_pages%) do for %%A in (%_hex%) do for %%B in (%_hex%) do echo %%A%%B 3%%P 0D 00 0A 00
) > "%temp%\forTokens.hex.txt"
certutil.exe -decodehex -f "%temp%\forTokens.hex.txt" "%temp%\forTokens.utf-16le.bom.txt" >NUL
chcp 65001 >NUL
type "%temp%\forTokens.utf-16le.bom.txt" > "%temp%\forTokens.utf8.txt"
(for /L %%N in (0,1,%$numTokens%) do set /P "$%%N=")  < "%temp%\forTokens.utf8.txt"
chcp %_cp% >NUL
del "%temp%\forTokens.*.txt"
for %%v in (_cp _hex _pages) do set "%%v="
exit /B



:CreateNestedFors

setlocal EnableDelayedExpansion
set /A "numTokens=$numTokens-1, mod=numTokens%%31, i=numTokens/31, lim=31"
if %mod% equ 0 set "mod=31"
set "NestedFors="
for /L %%i in (32,31,%numTokens%) do (
   if !i! equ 1 set "lim=!mod!"
   set "NestedFors=!NestedFors! for /F "tokens=1-!lim!* %delims%" %%!$%%i! in ("%%!$%%i!") do"
   set /A "i-=1"
)
for /F "delims=" %%a in ("!NestedFors!") do endlocal & set "NestedFors=%%a"
exit /B



:ExpandTokensString variable=tokens definitions ...

setlocal EnableDelayedExpansion
set "var=" & set "tokens=" & set "len=0"
if "%~2" equ "" (set "params=%~1") else set "params=%*"
for %%a in (!params!) do (
   if not defined var (
      set "var=%%a"
   ) else for /F "tokens=1-3 delims=-+" %%i in ("%%a") do (
      if "%%j" equ "" (
         if %%i lss %$numTokens% set "tokens=!tokens! %%!$%%i!" & set /A len+=1
      ) else (
         if "%%k" equ "" (set "k=1") else set "k=%%k"
         if %%i leq %%j (
            for /L %%n in (%%i,!k!,%%j) do if %%n lss %$numTokens% set "tokens=!tokens! %%!$%%n!" & set /A len+=1
         ) else (
            for /L %%n in (%%i,-!k!,%%j) do if %%n lss %$numTokens% set "tokens=!tokens! %%!$%%n!" & set /A len+=1
         )
      )
   )
)
endlocal & set "%var%=%tokens%" & set "%var%.len=%len%"
exit /B

Output example:

Code: Select all

Enter tokens definitions; the valid tokens numbers are in 1..2770 range.

You may insert a tokens range in ascending or descending order with an optional
increment/decrement different than 1. For example: tokens=10-6,87,100-500+100

tokens=10-6,87,100-500+100
11 tokens defined
Original:  A 9 8 7 6 1P 1c 3E 4q 6S 84
Decimal:   10 9 8 7 6 87 100 200 300 400 500

tokens=500-2500+500
5 tokens defined
Original:  84 G8 OC WG eK
Decimal:   500 1000 1500 2000 2500

tokens=2760-2775
11 tokens defined
Original:  iW iX iY iZ ia ib ic id ie if ig
Decimal:   2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770

tokens=1,500,1000,2000,2500,2600,2700
7 tokens defined
Original:  1 84 G8 WG eK fw hY
Decimal:   1 500 1000 2000 2500 2600 2700

tokens=

Final note: this program fail when a file line have less tokens than the expected ones and the last nested FOR /F don't process any data. I'll try to solve this problem in the next version...

EDIT 2017-04-24: The third version of this application, that correctly process lines with variable number of tokens, is ready. You may download it from this post.

Antonio

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Using many "tokens=..." in FOR /F command in a simple way

#52 Post by dbenham » 25 Mar 2017 00:15

Aacini wrote:...
Operation rules of FOR /F command about "tokens=..." option:

  • The maximum number of tokens in a FOR /F command is 32, including the "rest of tokens" last one: "tokens=1-31*".
  • The maximum length of the tokens is limited by the maximum length of the command-line that use each token, that is 8191 bytes. Shorter commands allows to use larger tokens.
  • The maximum number of tokens in the lines of a text file is equal to 4126, when all tokens have just one character and the command that process the "rest of tokens" last token is not too large. If the tokens after the 31th one are larger, then its maximum number decrease accordingly, so the length of the "rest of tokens" last token must always fit in its 8191 bytes command-line.
  • The maximum length of the lines of a text file may be near to 261000 bytes, when the 32 possible tokens have they all a length near to 8191 bytes.

Nice investigation, and a very useful summary. :D
Your post is well laid out, and relatively easy to follow.

I think jeb already discovered at least some of these points, and posted examples of working with more than 8191 characters somewhere on DosTips. I believe it may have been during development of macros with arguments, but I'm not sure. I did not fully understand what was happening in his post at that time.

Aacini wrote:Finally, the solution given by Dave was very simple and provided a way to use much more additional tokens than the originally planned ones.

I would never have looked at this issue had you not started this thread. And I could not have created the "simple" solution without penpen's and aGerman's keen insight into how the variables map to the Unicode code points, and most importantly, penpen's Eureka moment and critical experiment with code page 65001.


Aacini wrote:However, this test is not complete because it does not confirm that all the tokens in the data file are processed correctly, so I want a test that allow me to identify any token I wish. This point is important to me because I want not that the same unpleasant surprise happen one more time (I developed the first version of this application based on reports from three people that stated that "all characters from 0x80 to 0xFF works correctly as FOR tokens, excepting 0xFF" :evil: ).

I still stand by the integrity of my SO post :twisted: In that post I stated that 0x80 - 0xFE could be used, but 0xFF could not be used. That is true for code pages 437 and 850 (and I suspect others). But I was not aware that character availability can change with different code pages. More importantly, I was not aware that the sequence of the characters is not the same as the byte value. So I was correct that the many characters could be used, but the jumbled order means that they cannot be used very effectively without significant difficulty.

I do plan on updating the SO post to bring it more up-to-date with the current level of understanding. I won't go into too much detail with the mapping, but rather will link back to this thread.


Aacini wrote:The purpose of my program is assemble a method that allows to process a text file via a FOR /F command using many tokens, as much as possible, but in a simple way. Note that this application is not just a technicall curiosity: perhaps a practical use would not be to process hundreds or thousands of tokens in each line of a very large file, but just process a few number of tokens taking them from a large file that may contain hundreds of values in each line, like a very large spreadsheet.

Before you spend much more time in refining this effort at allowing the user to specify which tokens to parse, I suggest you compare the performance of your existing code with my existing code that parses all tokens from 1 to n. Just because I parse all the tokens, doesn't mean the DO loop has to access them. But blindly parsing the entire range is so much simpler than trying to efficiently parse only the requested tokens. Unless your code is significantly faster, I'm not sure it is worth the extra complexity. One nice aspect of parsing all tokens is it makes it very easy to keep track of which token variable corresponds to which column position.

===============================================

I have one addition to make to my code - A :getTokenValue function that can get the value of any dynamically chosen token. I have include the code for all 4 previously developed routines, just to have everything in one place. But this demo only uses 2 of the 4, plus the new function.

Code: Select all

@echo off
setlocal enableDelayedExpansion
cls

:: Define a test line with 300 tokens in the format ("1","2","3",..."300",)
set "ln="
for /l %%N in (1 1 300) do set "ln=!ln!"%%N","

:: Define a macro to parse all 300 tokens
call :defineFor forMacro A 300 ","

:: Parse the 300 tokens with the macro
for /f "delims=" %%A in ("!ln!") do %forMacro% (

  %= Use the new getTokenValue function to randomly select and display tokens =%
  for /l %%N in (1 1 5) do (
    set /a "token=!random! %% 300 + 1"
    call :getTokenValue val $!token!
    echo token !token! = !val!
  )
  echo(
  %= Do the same thing, but with the ~ modifier =%
  for /l %%N in (1 1 5) do (
    set /a "token=!random! %% 300 + 1"
    call :getTokenValue val $!token! ~
    echo unquoted token !token! = !val!
  )
)

exit /b



:getTokenValue  RtnVar  TokenName  [Modifiers]
::
:: Assigns the value of FOR variable TokenName to environment variable RtnVar.
:: Expansion modifiers may be specified with the optional Modifiers argument.
::
setlocal enableDelayedExpansion
set "token=!%~2!"
for %%. in (.) do endlocal & set "%~1=%%%~3%token%"
exit /b


:defineFor  ForMacroName  InputVar  TokenCount  [DelimChars]
::
:: Defines a macro to be used for parsing an arbitrary number of tokens from
:: a FOR variable string. The macro always parses one additional token to hold
:: any remainder of the line that lies beyond the TokenCount tokens. The input
:: line should have at least as many tokens as TokenCount, else you run the
:: risk of the macro not returning anything (no tokens parsed).
::
::    ForMacroName = The name of the macro variable to be created.
::
::    InputVar = The name of the FOR variable that contains the string of tokens.
::
::    TokenCount = The number of tokens to parse.
::                 The maximum value is 2303 (256*9-1)
::
::    DelimChars = An optional string of one or more characters, each of which
::                 is treated as a token delimiter. Default is "<tab><space>".
::                 If <space> is included in the string, then it must be the
::                 last character in the string.
::
:: Tokens are accessed by $n variables.
:: For example, %%%$45% would represent the 45th token.
::
:: FOR /F modifiers may be freely used. For example, %%~nx%$10% would treat the
:: 10th token as a file path, and would expand to the file name and extension.
::
:: Normally, a single FOR /F is limited to 31 tokens, but the macro supports
:: many more, theoretically as many as 2303. However, each line to be parsed
:: must be less than 8191 characters in length.
::
:: This function may be called with delayed expansion enabled or disabled.
:: It is generally recommended that the macro be used with delayed expansion
:: disabled so that tokens containing ! are not corrupted.
::
:: This function automatically calls :defineForChars to define enough $n
:: variables to satisfy the TokenCount+1 tokens.
::
:: Example usage - Suppose you want to parse a well behaved CSV file named
:: test.csv that contains 300 columns. All lines must have the same number of
:: columns, and no column value may contain a comma.
::
:: The following code will correctly parse each data line of test.csv:
::
::    @echo off
::    setlocal disableDelayedExpansion
::    call :defineFor For300InA A ","
::    for /f "skip=1 delims=" %%A in (test.csv) do %For300InA% (
::      echo token   1 = %%%$1%
::      echo token   2 = %%%$2%
::      echo ...
::      echo token 300 = %%%$300%
::    )
::
:: If the first token might begin with any character, including the default
:: EOL character, then the FOR /F line should be changed as follows:
::
::    for /f skip^=1^ delims^=^ eol^= %%A in (test.csv) do %For300InA% (
::   
if %$max%0 gtr %~30 goto :defineForInternal
set /a "$max=(%~3+256)/256"
call :defineForChars %$max%
:defineForInternal
setlocal enableDelayedExpansion
set "delims=%~4"
if not defined delims set "delims= "
set "in=%~2"
set "macro="
set /a max=31, end=0
for /l %%N in (1 31 %~3) do (
  if %%N neq 1 set "in=!$%%N!"
  set /a end+=31
  if !end! gtr %~3 set /a "max=%~3-%%N+1"
  set "macro=!macro! for /f "eol=!delims:~0,1! tokens=1-!max!* delims=!delims!" %%!$%%N! in ("%%!in!") do"
)
for /f "delims=" %%A in ("!macro! ") do endlocal & set "%~1=%%A"
exit /b


:defineForChars  Count
::
:: Defines variables to be used as FOR /F tokens, from $1 to $n,
:: where n = Count*256
:: Also defines $max = Count*256.
:: No other variables are defined or tampered with.
::
:: The maximum allowed Count is 9, meaning the max $max is 2304.
::
:: Once defined, the variables are very useful for parsing lines with a fixed
:: number of tokens > 31, as the values are guaranteed to be contiguous within
:: the FOR /F mapping scheme.
::
:: For example, you can use $1 as a FOR variable by using %%%$1%.
::
::   FOR /F "TOKENS=1-31" %%%$1% IN (....) DO ...
::
::      %%%$1% = token 1, %%%$2% = token 2, ... %%%$31% = token 31
::
:: This routine never uses SETLOCAL, and works regardless whether delayed expansion
:: is enabled or disabled.
::
:: Three temporary files are created and deleted in the %TEMP% folder, and the active
:: code page is temporarily set to 65001, and then restored to the starting value
:: before returning. Once defined, the $n variables can be used with any code page.
::
for /f "tokens=2 delims=:." %%P in ('chcp') do call :DefineForCharsInternal %1
exit /b
:defineForCharsInternal
set /a $max=%1*256
>"%temp%\forVariables.%~1.hex.txt" (
  echo FF FE
  for %%H in (
    "0 1 2 3 4 5 6 7 8 9 A B C D E F"
  ) do for /l %%N in (1 1 %~1) do for %%A in (%%~H) do for %%B in (%%~H) do (
    echo %%A%%B 0%%N 0D 00 0A 00
  )
)
>nul certutil.exe -decodehex -f "%temp%\forVariables.%~1.hex.txt" "%temp%\forVariables.%~1.utf-16le.bom.txt"
>nul chcp 65001
>"%temp%\forVariables.%~1.utf8.txt" type "%temp%\forVariables.%~1.utf-16le.bom.txt"
<"%temp%\forVariables.%~1.utf8.txt" (for /l %%N in (1 1 %$max%) do set /p "$%%N=")
for %%. in (dummy) do >nul chcp %%P 
del "%temp%\forVariables.%~1.*.txt"
exit /b


:defineVariantFor  ForMacroName  InputVar  TokenCount  [DelimChars  [DummyChar]]
::
:: Nearly the same as :DefineFor, except the resultant macro is designed to work
:: with input lines that may have fewer than TokenCount tokens.
::
::    ForMacroName = The name of the macro variable to be created.
::
::    InputVar = The name of the FOR variable that contains the string of tokens.
::
::    TokenCount = The maximum number of tokens to parse.
::                 The maximum value is 2226.
::
::    DelimChars = An optional string of one or more characters, each of which
::                 is treated as a token delimiter. Default is "<tab><space>".
::                 If <space> is included in the string, then it must be the
::                 last character in the string.
::
::    DummyChar  = An optional string to use as a prepended dummy token for each
::                 FOR /F iteration. The DummyChar is needed to support the
::                 requested number of tokens, yet still have the macro return
::                 results if the input line has fewer than TokenCount tokens.
::                 The DummyChar must not match any character within Delims.
::                 The default is "x". This optional parameter is only needed if
::                 Delims includes "x".
::
:: Tokens are accessed by $vn variables.
:: For example, %%%$v45% would represent the 45th token.
::
:: Any remaining unparsed string is contained in one of two possible tokens:
::   - If TokenCount is divisible by 30,
::     then the remainder is in $vExtra(TokenCount+1)
::   - Else the remainder is in $v(TokenCount+1)
::
:: So if TokenCount=300 (a multiple of 30), then the remainder is in $vExtra301.
:: But if TokenCount=299 (not a multiple of 30), then the remainder is in $v300.
::
:: This function automatically calls :defineVariantForChars to define enough $vn
:: and $vExtran variables to satisfy the TokenCount+1 requirement.
::
:: All other behaviors are the same as for :defineFor
::
if %$vmax%0 gtr %~30 goto :defineVariantForInternal
set /a "$vmax=(%~3+(%~3+30)/30+257)/256"
call :defineVariantForChars %$vmax%
:defineVariantForInternal
setlocal enableDelayedExpansion
set "dummy=%~5"
if not defined dummy set "dummy=x"
set "delims=%~4"
if not defined delims set "delims= "
set "dummy=!dummy!!delims:~0,1!"
set "in=%~2"
set "macro="
set /a max=31, end=0
for /l %%N in (1 30 %~3) do (
  if %%N neq 1 set "in=!$vExtra%%N!"
  set /a end+=30
  if !end! gtr %~3 set /a "max=%~3-%%N+2"
  set "macro=!macro! for /f "eol=!delims:~0,1! tokens=1-!max!* delims=!delims!" %%!$vExtra%%N! in ("!dummy!%%!in!") do"
)
for /f "delims=" %%A in ("!macro! ") do endlocal & set "%~1=%%A"
exit /b


:defineVariantForChars  Count
::
:: Nearly the same as :defineForChars, except this function defines $vn, $vmax,
:: and $vExtran variables instead of $n and $max variables.
::
:: These variables are useful for parsing lines with many tokens when the number
:: of tokens may vary on each line.
::
:: For every 30 $vn variables, there is one $vExtran variable defined.
::
:: The maximum Count allowed is 9, which results in a max $vmax of 227, which
:: corresponds to the largest $vn of $v2227.
::
:: All other behaviors are the same as for :defineForChars.
::
for /f "tokens=2 delims=:." %%P in ('chcp') do call :DefineVariantForCharsInternal %1
exit /b
:defineVariantForCharsInternal
set /a "$vmax=%1*256, $vmax-=($vmax+30)/30"
>"%temp%\forVariables.%~1.hex.txt" (
  echo FF FE
  for %%H in (
    "0 1 2 3 4 5 6 7 8 9 A B C D E F"
  ) do for /l %%N in (1 1 %~1) do for %%A in (%%~H) do for %%B in (%%~H) do (
    echo %%A%%B 0%%N 0D 00 0A 00
  )
)
>nul certutil.exe -decodehex -f "%temp%\forVariables.%~1.hex.txt" "%temp%\forVariables.%~1.utf-16le.bom.txt"
>nul chcp 65001
>"%temp%\forVariables.%~1.utf8.txt" type "%temp%\forVariables.%~1.utf-16le.bom.txt"
<"%temp%\forVariables.%~1.utf8.txt" (for /l %%N in (1 1 %$vmax%) do (
   2>nul set /a "1/((%%N-1)%% 30)" || set /p "$vExtra%%N="
  set /p "$v%%N="
))
for %%. in (dummy) do >nul chcp %%P 
del "%temp%\forVariables.%~1.*.txt"
exit /b
-- SAMPLE OUTPUT --

Code: Select all

token 168 = "168"
token 211 = "211"
token 95 = "95"
token 80 = "80"
token 111 = "111"

unquoted token 37 = 37
unquoted token 255 = 255
unquoted token 173 = 173
unquoted token 60 = 60
unquoted token 31 = 31


Dave Benham

Thor
Posts: 43
Joined: 31 Mar 2016 15:02

Re: Using many "tokens=..." in FOR /F command in a simple way

#53 Post by Thor » 25 Mar 2017 11:00

Hi Aacini,

I've run your "MakeForTokens.bat" with a token of 1342 and I did not see the lines that said:
Token via FOR: {}
...
...
anymore. Do you know why?
Attachments
ForTokens.zip
(2.86 KiB) Downloaded 938 times

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Using many "tokens=..." in FOR /F command in a simple way

#54 Post by Aacini » 25 Mar 2017 12:07

@Thor,

As I said in the "FOR /F rules", the maximum number of tokens depends on the length of the "rest of tokens" placed after token # 31 and the length of the command where the token #32 is used. In the program used to test these points, the maximum tokens formed with numbers enclosed in braces, like "{1},{2},...", is 1348. However, the commands used in this nested FOR /F method are more complicated, so the maximum number slightly decrease to 1342...

If you want to do a series of tests over several files with different types of data, it would be a good idea to first test such files with the last code given in "FOR /F rules" (where the created forB.txt file must be equal to forA.txt one), so you know what is the maximum possible number of tokens with such data. After that, you should expect a slightly lesser number when your process the file with this nested FOR /F method.

Antonio

Thor
Posts: 43
Joined: 31 Mar 2016 15:02

Re: Using many "tokens=..." in FOR /F command in a simple way

#55 Post by Thor » 25 Mar 2017 13:53

Hi Aacini:
I've got it. Thanks for the clarification. :D

Hi dbenham:
I like your new :getTokenValue function add-in.

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Using many "tokens=..." in FOR /F command in a simple way

#56 Post by Aacini » 28 Mar 2017 19:23

@Dave,

This is funny! At first, you complained on the lack of details in my reply about the participation of penpen and aGerman in the finding of the solution:

dbenham wrote:
Aacini wrote:Finally, the solution given by Dave was very simple and provided a way to use much more additional tokens than the originally planned ones.

I would never have looked at this issue had you not started this thread. And I could not have created the "simple" solution without penpen's and aGerman's keen insight into how the variables map to the Unicode code points, and most importantly, penpen's Eureka moment and critical experiment with code page 65001.


It seems that you didn't realized that I just wrote a short summary of the current situation as a brief introduction to my reply, because otherwise it would appeared as somewhat incoherent. If readers wants to know more details about the solution and the participants involved in it, they just needs to read the thread...

After that, you justified your own lack of very important operation details on the referenced post at Stack Overflow using totally unrelated points:

dbenham wrote:I still stand by the integrity of my SO post :twisted: In that post I stated that 0x80 - 0xFE could be used, but 0xFF could not be used. That is true for code pages 437 and 850 (and I suspect others). But I was not aware that character availability can change with different code pages. More importantly, I was not aware that the sequence of the characters is not the same as the byte value. So I was correct that the many characters could be used, but the jumbled order means that they cannot be used very effectively without significant difficulty.

This answer tempted me to post a further reply, so here it is. I am copying here the relevants parts of the SO post, because these parts provides a reference frame about this matter to other readers.

at this SO question the OP wrote:Windows batch script to parse CSV file and output a text file

I've seen a response on another page (Help in writing a batch script to parse CSV file and output a text file) - brilliant code BTW:

Code: Select all

@ECHO OFF
IF "%~1"=="" GOTO :EOF
SET "filename=%~1"
SET fcount=0
SET linenum=0
FOR /F "usebackq tokens=1-10 delims=," %%a IN ("%filename%") DO ^
CALL :process "%%a" "%%b" "%%c" "%%d" "%%e" "%%f" "%%g" "%%h" "%%i" "%%j"
GOTO :EOF

more code omitted here . . .


It works brilliantly for an example csv file I made... (with 3 fields per line)

However the actual file I want to change comprises of 64 fields - so I altered the tokens=1-10 to tokens=1-64 and increased the %%a etc right up to 64 variables (the last being called %%BL for example). Now, however, when I run the batch on my 'big' csv file (with the 64 tokens) nothing happens. No errors (good) but no output! (bad). If anyone can help that would be fantastic... am soooo close to getting the whole app working if I can just nail this last bit! Or if anyone has some example code that will do similar for an indefinite number of tokens...


This is your answer, the one I read that encouraged me to write this application. I just copied the parts that are directly related to the management of more than 32 tokens in FOR /F command.

at this SO answer dbenham wrote:
Important update - I don't think Windows batch is a good option for your needs because a single FOR /F cannot parse more than 31 tokens. See the bottom of the Addendum below for an explanation.

However, it is possible to do what you want with batch. This ugly code will give you access to all 64 tokens.

Code: Select all

for /f "usebackq tokens=1-29* delims=," %%A in ("%filename%") do (
  for /f "tokens=1-26* delims=," %%a in ("%%^") do (
    for /f "tokens=1-9 delims=," %%1 in ("%%{") do (
      rem Tokens 1-26 are in variables %%A - %%Z
      rem Token  27 is in %%[
      rem Token  28 is in %%\
      rem Token  29 is in %%]
      rem Tokens 30-55 are in %%a - %%z
      rem Tokens 56-64 are in %%1 - %%9
    )
  )
)

The addendum provides important info on how the above works.

. . .

Addendum

I performed some tests, and can report the following (updated in response to jeb's comment):

Most characters can be used as a FOR variable, including extended ASCII 128-254. But some characters cannot be used to define a variable in the first part of a FOR statement, but can be used in the DO clause. A few can't be used for either. Some have no restrictions, but require special syntax.

The following is a summary of characters that have restrictions or require special syntax.

Code: Select all

A table with several characters appears here, indicating if they may be used to *define* or *have access* as FOR /F tokens.
The equal-sign character appears as it can not be used to define a FOR /F token, but have access to a posterior defined one.
The character 255 is the only one greater than 127 that appears here; it can not be used to define a FOR token, nor have access to it.

. . .

Some characters cannot be used to define a FOR variable... But %%= can be implicitly defined by using the TOKENS option, and the value accessed in the DO clause like so:

Code: Select all

for /f "tokens=1-3" %%^< in ("A B C") do echo %%^< %%^= %%^>

. . .


I want note that this phrase: "some characters cannot be used to define a variable in the first part of a FOR statement, but can be used in the DO clause NECESSARILY IMPLY that the character was tested in a sequence that place such a character after another one. This is the case of the equal-sign "=" character, that in previous example is placed after the less-than sign "<" one.

The only way that the character 255 0xFF can be reported as can No have access in the DO clause is if you tested it with A PREVIOUS CHARACTER "to define a variable in the first part of a FOR statement". This result imply two points: 1- The character 255 fail in this SUCCESSIVE TOKENS test. 2- No other character in the 128-254 range was reported as it fails in the same table/test.

If you read this phrase: "Most characters can be used as a FOR variable, including extended ASCII 128-254" and this one: "The following is a summary of characters that have restrictions or require special syntax", combined with the fact that the table show characters that have restrictions on the successive FOR -> DO tokens test (like the equal-sign) AND that the character 255 is the only one in the 128-255 range that was included in this table, then the conclusion about the availability of characters in 128-254 range that can be used as valid FOR /F tokens is obvious! Should I explain what such a conclusion is?

Antonio

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Using many "tokens=..." in FOR /F command in a simple way

#57 Post by dbenham » 28 Mar 2017 23:03

Why do I feel like I've entered a pissing match :?:

I saw you give me credit for my solution, which was appreciated. All I was trying to do was deflect some of the credit to others that I think deserve it, including yourself. Yet, I seem to have somehow offended you :?

Regarding my SO post, I've already acknowledged that it needs improvement. I was under the false impression that the FOR variable order matched the numeric byte value sequence, and I was unaware that the code page could affect things - both of which are important discoveries in this thread. Despite those misunderstandings on my part, I was able to accurately discern and document (with some help from jeb) which characters could be used in a FOR /F statement (at least for code pages 437 and 850). I don't think anyone documented that information before.

Aacini wrote:If you read this phrase: "Most characters can be used as a FOR variable, including extended ASCII 128-254" and this one: "The following is a summary of characters that have restrictions or require special syntax", combined with the fact that the table show characters that have restrictions on the successive FOR -> DO tokens test (like the equal-sign) AND that the character 255 is the only one in the 128-255 range that was included in this table, then the conclusion about the availability of characters in 128-254 range that can be used as valid FOR /F tokens is obvious! Should I explain what such a conclusion is?
Yes, I had tested some ranges, but only enough to discover which characters could be used, and not enough to discover that the order did not match the byte code. My misunderstanding could have led to the wrong conclusion had the ordering been different, but I lucked out in this case.

Six years later, you have made a critical new discovery that the FOR variable mapping does not match the byte code value, and that has led to a collective effort that resulted in new FOR /F techniques that no one ever thought possible. I think that is something to celebrate :D


Dave Benham

Thor
Posts: 43
Joined: 31 Mar 2016 15:02

Re: Using many "tokens=..." in FOR /F command in a simple way

#58 Post by Thor » 30 Mar 2017 15:56

This is a proof of concept. Handle large set of tokens from 1-10,000 tokens.

- Since a For /F loop could not handle file larger than 8191 bytes so in order to accommodate
larger token set, I've divided the token set into smaller file chunks which length is not larger
than 8191 bytes in size.

- Data1.txt has 1850 tokens in the range from 1-1850
- Data2.txt has 1630 tokens in the range from 1851-3480
- Data3.txt has 1630 tokens in the range from 3481-5110
- Data4.txt has 1630 tokens in the range from 5111-6740
- Data5.txt has 1630 tokens in the range from 6741-8370
- Data6.txt has 1630 tokens in the range from 8371-10000

- Just unzip the enclosed zip file into any drive
- Run "demo_get_random_token.bat" file to see 30 random tokens in the range from 1 to 10,000. Yes it's that many tokens for right now, and there's no limit, as long as you've created enough data file(s) for it. And the values are real numbers and not just cryptic 1 or 2 letters and you have no clue what they really are. :-)

- You can play around with the input value by increasing or decreasing it in the for..loop.

- For the result, you should see something like this: (for example)
2401 (this is the input value)
token 551 = 2401 (551 is the exact index corresponding to the actual position in dataN.txt file and 2401 is the output value. So you should expect the input and output values must match and be the same.
Attachments
Token_Demo.zip
(21.65 KiB) Downloaded 931 times

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Using many "tokens=..." in FOR /F command in a simple way

#59 Post by Aacini » 02 Apr 2017 20:51

dbenham wrote:Why do I feel like I've entered a pissing match :?:

[snip]

Six years later, you have made a critical new discovery that the FOR variable mapping does not match the byte code value, and that has led to a collective effort that resulted in new FOR /F techniques that no one ever thought possible. I think that is something to celebrate :D

Dave Benham

Of course, you are right: the results obtained from this thread are good, no matters how we arrive to them, and I celebrate it. However, that is not my concern and I am astonished that you didn't realized what is my point yet! I want make a last attempt to try to show you what exactly I am talking about, but I will use the foxidrive's method this time: I will tell you a funny story based on a non-computer topic with the places of you and me interchanged; perhaps this way you may appreciate my point of view. However, I am tired of this absurd discussion, so I will not answer any further reply about it...

Let's suppose that I have an used Compact Disk player & changer that accepts 50 disks (like this one) that I am selling in eBay. Further suppose that you are interested in the player, so before you buy it you carefully read my description of the item: "CD Player and Changer with capacity for 50 disks. I tested it with all the 50 disks inserted and the player works correctly, excepting for the disk # 50. All controls to play songs and change disks works correctly and the music is clear. Two people also tested the player independently: they opened the loading-disks doors, inserted a different disk in all the disk slots at positions 1-49 and closed the doors with no problems".

You review my record as eBay vendor and confirm that I have a very high reputation, so you trust my description and buy the player. When you receive it you load 50 disks and start testings, but just the disks in slots 1-20 works correctly: any disk placed after the slot # 20 don't play! You post a public message describing the fail and this cause that several people participate in a common effort to fix the player. After several tests and generation of numerous new ideas, I posted a method that not only fix the problem with the disks after the # 20, but that also open the doors of a second disk storage area, unnoticed before, that duplicate the original player disk capacity up to 100 disks. You are happy with your fully working 100-disks player, so you post a message to inform all participants that the repaired player works correctly now, but you also add a small note mentioning that "the original player description is wrong". However, instead of just accept that the player did NOT "correctly worked" with 50 disks, I answer this way:

"I still stand by the integrity of my eBay description :twisted: In that description I stated that 50 disks can be inserted in the player, but disk # 50 could not be used. That is true. But I was not aware that disks in slots after the # 20 don't plays. (I didn't tested any disk in slots 21-50 before set the eBay auction, excepting for slot # 50). More importantly, I was not aware that the player mechanism could work in certain disks, but fail in others. So I was correct that 50 disks could be used, but the complexity of the disk player mechanism means it cannot be used in all disks in the same way".

You get angry by this answer because, in your opinion, the eBay description rudely misrepresents the item, so you elaborate on the points that you think should be obvious since the very beginning. The purpose of a "50 disks player and changer" is play and change 50 disks, not just 20! You are not interested to know that three different people successfully inserted 50 disks in the player, but you want to know why no one of them tested a disk in the 21-49 range! You want not know about the involved reasons I devised to justify that I did not completed a single test on the disks in the 21-49 range, but you want know why I didn't included this simple phrase in the item description: "I have not tested any disk in the 21-49 range". (You think that all these points comprises a huge and incomprehensible omission about very important details on the fundamentals of a "50 disks player and changer" device that could be obtained in an extremely simple way). Finally, you want not know what happened; you just want that I accept that I was wrong when I said that the player "works correctly"...

After you post this message, I answer this way:

"Regarding my eBay description, I've already acknowledged that it needs improvement. I was under the false impression that all disks are played in the same way, and I was unaware that high disk numbers could affect things - that is an important discovery.

Yes, I had tested some disks, but only enough to discover they corresponds to the slot, and not enough to discover that higher disks not works. My misunderstanding could have led to the wrong conclusion that all disks works the same.

Now, you have made a critical new discovery that higher disk slots works different, and that has led to a collective effort that resulted in a new set of disks slots that no one knows before. Why you complaint? I think that is something to celebrate :D"

Antonio

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Using many "tokens=..." in FOR /F command in a simple way

#60 Post by Aacini » 24 Apr 2017 21:30

The original "nested FOR /F commands" method used in this application requires that input lines have enough tokens, so the last nested FOR /F command process at least one token; otherwise, the commands placed inside such last nested FOR /F are not executed at all. This means that files with varying number of tokens per line could not be processed using the original method.

I modified the method in a way that it can now process a variable number of tokens per line, from 1 up to a given maximum number. The third version of my MakeForTokens.bat application can generate code for both fixed-number and variable-number of tokens; you just need to include the /V switch in order to generate the variable-tokens version of the code. The modifications in the variable-tokens code vs. the original method are small and fully explained in the ForTokens.bat generated program. This is MakeForTokens.bat program Version 3:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

rem MakeForTokens.bat application written by Antonio Perez Ayala
rem Version 2: Use the new method to generate FOR /F replaceable parameters
rem Version 3: Correctly process variable number of tokens

if "%~1" neq "" if "%~1" neq "/?" goto begin

:usage
echo/
echo Create ForTokens.bat file as the prototype of a program that can read a very
echo large number of tokens from a file via a series of nested FOR /F commands.
echo/
echo    MakeForTokens.bat numTokens [/V]
echo/
echo The number of tokens must be between 32 and 4094.
echo/
echo After the ForTokens.bat file is created, you must rename and modify it to suit
echo your needs; full descriptions of the required modifications are included.
echo/
echo The management of such amount of tokens is done via a series of nested FOR /F
echo commands with 31 tokens each; however, if an input line have not enough number
echo of tokens for the *last* nested FOR /F command, no one token will be processed.
echo/
echo If /V switch is given, a slightly different code is generated that correctly
echo process a variable number of tokens, from 1 up to the given number; the
echo differences are small and fully described in the ForTokens.bat program.
echo/
echo If you give 300 in the number of tokens, the generated ForTokens.bat program
echo may run with no modifications over an example data file that is created.
goto :EOF


:begin
set /A "numTokens=0, numTokens=%~1, i=0" 2>NUL
if %numTokens% lss 32 goto usage
if %numTokens% gtr 4094 goto usage
for /F "delims=:" %%a in ('findstr /N /B ":Header :Middle :Trailer" "%~F0"') do (
   set /A i+=1
   set "line[!i!]=%%a"
)
< "%~F0" (
   echo @echo off ^& setlocal EnableDelayedExpansion ^& set "numTokens=%numTokens%"
   for /L %%i in (1,1,%line[1]%) do set /P "="
   set /A copy=line[2]-line[1]-3
   for /L %%i in (1,1,!copy!) do set "line=" & set /P "line=" & echo/!line!
   if /I "%~2" neq "/V" (
      for /L %%i in (1,1,3) do set /P "="
      set /A copy=line[3]-line[2]-3, skip=line[4]-line[3]+3
      for /L %%i in (1,1,!copy!) do set "line=" & set /P "line=" & echo/!line!
      for /L %%i in (1,1,!skip!) do set /P "="
   ) else (
      set /A skip=line[3]-line[2]+3, copy=line[4]-line[3]-3
      for /L %%i in (1,1,!skip!) do set /P "="
      for /L %%i in (1,1,!copy!) do set "line=" & set /P "line=" & echo/!line!
      for /L %%i in (1,1,3) do set /P "="
   )
   findstr "^"
) > ForTokens.bat
echo ForTokens.bat file created with support for %numTokens% tokens
goto :EOF


rem The following sections define the contents of the ForTokens.bat file


:Header

Rem/For  This is a base program that process a file via FOR /F commands with up to %numTokens% tokens
Rem/For  This program was created using MakeForTokens.bat application written by Antonio Perez Ayala

Rem/For  Step 0:
Rem/For    In order to use this program, you should have a data file with many tokens to process.
Rem/For    A simple data file with 305 tokens is created here, so the examples below works correctly.
( for /L %%i in (1,1,305) do set /P "={%%i}," ) < NUL > dataFile.txt

Rem/For  Step 1:
Rem/For    Define the series of auxiliary variables that will be used as FOR tokens.
Rem/For    The DefineForTokens subroutine use the value of numTokens variable as input.
call :DefineForTokens

Rem/For  Step 2: (optional, but very useful)
Rem/For    Define an auxiliary variable that will contain the desired tokens when it is %expanded%.
Rem/For    This variable is created from a string similar to the original FOR /F "tokens=x,y,m-n" one,
Rem/For    but that allows larger token numbers, ranges in descending order and increments greater than 1;
Rem/For    the number of created tokens is also returned. See the full features description later.
Rem/For    You must enclose the tokens definition between quotes, like in FOR /F command.
Rem/For    In the next example the variable is called "tokens" and it contains these tokens/elements:
Rem/For    10 28 29 30 31 32 170 167 164 161, and "tokens.length=10" variable is also created.
call :ExpandTokensString "tokens=10,28-32,170-161-3"
echo This definition:  tokens=10,28-32,170-161-3  created %tokens.length% tokens

Rem/For  Step 3: (optional)
Rem/For    Define the variable with the "delims" value that will be used in the nested FOR's.
Rem/For    This variable must be named "delims" and it contains *the definition* of
Rem/For    the same option in the FOR /F command, including the "delims=" word itself. For example,
Rem/For    if you want no delims (default: TAB+space), delete "delims" variable: set "delims="
set "delims=delims=,"

Rem/For  Step 4:
Rem/For    Create the macro that contain the nested FOR's. This step must be performed after both
Rem/For    FOR tokens and "delims" variables was defined and before enter to the main FOR /F command.
Rem/For    The CreateNestedFors subroutine use the value of numTokens variable as input.


:Middle_Standard
call :CreateNestedFors

Rem/For  Step 5:
Rem/For    This is the main FOR /F command that process the input file.
Rem/For    There is one additional nested FOR /F command for each set of 31 tokens (or part of).
Rem/For    You must include the right filename and optional "eol=" part in the next line:
for /F "usebackq tokens=1-31* %delims%" %%%$1% in ("dataFile.txt") do %NestedFors% (

   Rem/For  Step 6:
   Rem/For    Process the tokens. To just show they, use the "tokens" variable defined above:
   echo Tokens: %tokens%

   Rem/For    You may process "tokens" values via a plain FOR command:
   for %%a in (%tokens%) do echo Token via FOR: %%a

   Rem/For    ... or via another FOR /F command:
   for /F "tokens=1-%tokens.length%" %%a in ("%tokens%") do (
      echo Token #1 in FOR /F: %%a
      echo Token #6 in FOR /F: %%f
      echo Token #9 in FOR /F: %%i
   )

   Rem/For    You may also directly use the auxiliary "$#" tokens variables. See description below.
   echo Token #242 via its token variable: %%%$242%
   echo Full path of token #273: %%~F%$273%

   Rem/For    If there are additional tokens after the numTokens number used to create this file,
   Rem/For    they will be grouped into the next token. For example, if this file was created via
   Rem/For    MakeForTokens.bat 300, then you may show the tokens beyond 300 this way:
   echo Additional tokens after the #300: %%%$301%

Rem/For  Closing parenthesis of the main FOR /F command
)

goto :EOF


:Middle_Variable
Rem/For    The /V switch is needed in order to manage a Variable number of tokens.
call :CreateNestedFors /V

Rem/For  Step 5:
Rem/For    This is the main FOR /F command that process the input file.
Rem/For    There is one additional nested FOR /F command for each set of 31 tokens (or part of).
Rem/For    The next variable is needed in order to manage a Variable number of tokens.
set /A "baseToken=(numTokens-1)/31*31+1"
Rem/For    You must include the right filename and optional "eol=" part in the next line:
for /F "usebackq tokens=1-31* %delims%" %%%$1% in ("dataFile.txt") do %NestedFors% call :ForTokensBody %baseToken%

Rem/For  End of program, after the file was processed
goto :EOF


Rem/For  Step 6:
Rem/For    This is the subroutine that process the tokens (the body in the nested FOR commands)

:ForTokensBody baseToken

Rem/For  Get the number of the last token in the current input line
set /A lastToken=%~1-1, newLast=lastToken+16
set "token=!$%newLast%!"
if %newLast% leq %numTokens% for %%a in (_) do if "%%%token%" neq "" set "lastToken=%newLast%"
set /A newLast=lastToken+8
set "token=!$%newLast%!"
if %newLast% leq %numTokens% for %%a in (_) do if "%%%token%" neq "" set "lastToken=%newLast%"
set /A newLast=lastToken+4
set "token=!$%newLast%!"
if %newLast% leq %numTokens% for %%a in (_) do if "%%%token%" neq "" set "lastToken=%newLast%"
set /A newLast=lastToken+2
set "token=!$%newLast%!"
if %newLast% leq %numTokens% for %%a in (_) do if "%%%token%" neq "" set "lastToken=%newLast%"
set /A newLast=lastToken+1
set "token=!$%newLast%!"
if %newLast% leq %numTokens% for %%a in (_) do if "%%%token%" neq "" set "lastToken=%newLast%"

Rem/For  To process tokens in current line, you must use lastToken variable in combination
Rem/For  with ExpandTokensString subroutine with /V switch. A simple example:
call :ExpandTokensString "tokenValue=%lastToken%" /V
echo The last token is the number %lastToken% with value: "%tokenValue%"

Rem/For  A larger example. To get the 10 values placed before the tenth from last token:
set /A startToken=lastToken-19, endToken=startToken+9
call :ExpandTokensString "tokens=%startToken%-%endToken%" /V
echo Value of the ten tokens before the tenth from last one ("tokens=%startToken%-%endToken%"):
echo %tokens%

Rem/For  You may process "tokens" values via a plain FOR command:
for %%a in (%tokens%) do echo Token via FOR: %%a

Rem/For  ... or via another FOR /F command:
for /F "tokens=1-10" %%a in ("%tokens%") do (
   echo Token #1 in FOR /F: %%a
   echo Token #6 in FOR /F: %%f
   echo Token #9 in FOR /F: %%i
)

Rem/For  You may also directly use the auxiliary "$#" tokens variables. See description below.
Rem/For  In this case (with /Variable number of tokens) you must access the tokens variables
Rem/Fpr  from *inside* an active FOR command and verify that such a token exists:
for %%a in (_) do (
   if %lastToken% geq 242 echo Token #242 via its token variable: %%%$242%
   if %lastToken% geq 295 echo Full path of token #295: %%~F%$295%
)

Rem/For  If there are additional tokens after the numTokens number used to create this file,
Rem/For  they will be grouped into the next token. For example, if this file was created via
Rem/For  MakeForTokens.bat 300 /V, then you may show the tokens beyond 300 this way:
for %%a in (_) do if %lastToken% equ 300 if "%%%$301%" neq "" (
   echo Additional tokens after the #300: %%%$301%
)

Rem/For  End of ForTokensBody subroutine
exit /B


:Trailer



Support subroutines. You must not modify any code below this line,
but all these explanations may be removed.


The next subroutine define the auxiliary variables that are used to access FOR /F tokens.

These variables are *called* $1, $2, etc., so %$43% is *the token* number 43, and %%%$43% is
*the value* of such a token when this construct is placed inside a FOR /F command;
note that the %%!$43! Delayed Expansion version does *not* work.
The usual FOR modifiers may be used: %%~NX%$1284% expands to name and extension of token 1284.

The method to create these variables was originally written by DosTips.com user dbenham. See:
http://www.dostips.com/forum/viewtopic.php?f=3&t=7703&p=51595#p51595

This subroutine does not have any SETLOCAL. It use the value of numTokens variable as input
and modify and delete these variables: _tokens, _pages, _hex.

:DefineForTokens

set /A "_tokens=numTokens+1, _pages=(_tokens/256+1)*2"
set "_hex= 0 1 2 3 4 5 6 7 8 9 A B C D E F"
call set "_pages=%%_hex:~0,%_pages%%%"
if %numTokens% gtr 2048 echo Creating FOR tokens variables, please wait . . .
(
   echo FF FE
   for %%P in (%_pages%) do for %%A in (%_hex%) do for %%B in (%_hex%) do echo %%A%%B 3%%P 0D 00 0A 00
) > "%temp%\forTokens.hex.txt"
certutil.exe -decodehex -f "%temp%\forTokens.hex.txt" "%temp%\forTokens.utf-16le.bom.txt" >NUL
for /F "tokens=2 delims=:." %%p in ('chcp') do set "_hex=%%p"
chcp 65001 >NUL
type "%temp%\forTokens.utf-16le.bom.txt" > "%temp%\forTokens.utf8.txt"
(for /L %%N in (0,1,%_tokens%) do set /P "$%%N=")  < "%temp%\forTokens.utf8.txt"
chcp %_hex% >NUL
del "%temp%\forTokens.*.txt"
for %%v in (_tokens _pages _hex) do set "%%v="
exit /B



The next subroutine creates the series of nested FOR's that covers all required FOR tokens;
it uses the value of numTokens variable as input.
If /V switch is given, the additional code to manage Variable number of tokens is included.

:CreateNestedFors [/V]

setlocal EnableDelayedExpansion
set /A "mod=numTokens%%31, i=numTokens/31, lim=31, j=1"
if %mod% equ 0 set "mod=31"
set "NestedFors="
for /L %%i in (32,31,%numTokens%) do (
   if /I "%~1" equ "/V" (
      set "NestedFors=!NestedFors! if "%%!$%%i!" equ "" (call :ForTokensBody !j!) else"
      set "j=%%i"
   )
   if !i! equ 1 set "lim=!mod!"
   set "NestedFors=!NestedFors! for /F "tokens=1-!lim!* %delims%" %%!$%%i! in ("%%!$%%i!") do"
   set /A "i-=1"
)
for /F "delims=" %%a in ("!NestedFors!") do endlocal & set "NestedFors=%%a"
exit /B



The next subroutine expands a tokens definition string into a series of individual tokens variables/values.

The tokens definition string have the same form of the standard FOR /F "tokens=x,y,m-n" part,
excepting the asterisk for "the rest of tokens" (that are always stored after %numTokens% one).
Additionally, you may define a tokens range in descending order: "tokens=10-6" produce 10 9 8 7 6
or use an inc-/decrement different than 1: "tokens=10-25+5,400-200-100" produce 10 15 20 25 400 300 200
When the subroutine ends, the number of created tokens is stored in a global variable
with same name of the tokens one, plus ".length" added at end.

If /Q switch is given, each token will be enclosed in Quotes.
If /V switch is given, the tokens variables are further expanded to their current Values:
- The standard tokens contain the tokens *variables* that will be expanded via %tokens% construct;
  use it when you want to process *the same tokens* in all file lines (that is the standard method).
- The /V switch further expands previous tokens variables into the real current tokens *Values*;
  use this switch when the desired tokens may change in a dynamic way in each line of the file.

:ExpandTokensString "variable=tokensDefinition" [/Q] [/V]

setlocal EnableDelayedExpansion
set "var=" & set "tokens=" & set "length=0" & set "quote=" & set "values="
if /I "%~2" equ "/Q" shift /2 & set quote="
if /I "%~2" equ "/V" shift /2 & set "values=1"
if /I "%~2" equ "/Q" set quote="
for %%a in (%~1) do (
   if not defined var (
      set "var=%%a"
   ) else for /F "tokens=1-3 delims=-+" %%i in ("%%a") do (
      if "%%j" equ "" (
         set "tokens=!tokens! !quote!%%!$%%i!!quote!" & set /A length+=1
      ) else (
         if "%%k" equ "" (set "k=1") else set "k=%%k"
         if %%i leq %%j (
            for /L %%n in (%%i,!k!,%%j) do set "tokens=!tokens! !quote!%%!$%%n!!quote!" & set /A length+=1
         ) else (
            for /L %%n in (%%i,-!k!,%%j) do set "tokens=!tokens! !quote!%%!$%%n!!quote!" & set /A length+=1
         )
      )
   )
)
if defined values for %%a in (_) do set "tokens=%tokens%"
endlocal & set "%var%=%tokens:~1%" & set "%var%.length=%length%"
exit /B

The Batch file below is an example of a program that can process a variable number of tokens, up to 3813, that was developed using the new version of MakeForTokens.bat.

Code: Select all

@echo off & setlocal EnableDelayedExpansion & set "numTokens=3813"

Rem/For  Step 1: Define the series of auxiliary variables that will be used as FOR tokens.
call :DefineForTokens

Rem/For  Step 2:  Define auxiliary variables that will contain the desired tokens when they are %expanded%.
call :ExpandTokensString "tokens1-10=1-10"
set "tokens1-10=%tokens1-10: =  %"
call :ExpandTokensString "tokens11-20=11-20"
set /A restOfTokens=numTokens+1
set "restOfTokens=%%!$%restOfTokens%!"

Rem/For  Step 3:  Define the variable with the "delims" value that will be used in the nested FOR's.
set "delims="

Rem/For  Step 4: Create the macro that contain the nested FOR's.
call :CreateNestedFors /V

cls
echo/
echo This program create a text file with the number of tokens per line given by the
echo user; then show the first 20 tokens and up to last 20 tokens of each line.
echo/
echo Enter number of tokens per line separated by spaces, i.e.: 100 1234 3820 1 40
echo/
echo The maximum number of tokens that can be processed in a line is %numTokens%

:loop
echo/
echo ===========================================
echo/
set /P "toksPerLine=Tokens per line: "
if errorlevel 1 goto :EOF

rem Create the lines with numbers in just the first 20 and last 20 positions, and "X" in the rest
(for %%a in (%toksPerLine%) do (
   if %%a geq 20 (set "lim1=20") else set "lim1=%%a"
   for /L %%b in (1,1,!lim1!) do set /P "=%%b "
   set /A lim1+=1, lim2=%%a-20
   for /L %%b in (!lim1!,1,!lim2!) do set /P "=X "
   set /A lim2+=1
   if !lim2! lss !lim1! set "lim2=!lim1!"
   for /L %%b in (!lim2!,1,%%a) do set /P "=%%b "
   echo/
)) < NUL > file.txt

Rem/For  Step 5: This is the main FOR /F command that process the input file.
set /A "baseToken=(numTokens-1)/31*31+1, line=0"
for /F "tokens=1-31*" %%%$1% in (file.txt) do %NestedFors% call :F %baseToken%

goto :loop


Rem/For  Step 6: This is the subroutine that process the tokens
:F baseToken

Rem/For  Get the number of the last token in the current input line
set /A lastToken=%~1-1, newLast=lastToken+16
set "token=!$%newLast%!"
if %newLast% leq %numTokens% for %%a in (_) do if "%%%token%" neq "" set "lastToken=%newLast%"
set /A newLast=lastToken+8
set "token=!$%newLast%!"
if %newLast% leq %numTokens% for %%a in (_) do if "%%%token%" neq "" set "lastToken=%newLast%"
set /A newLast=lastToken+4
set "token=!$%newLast%!"
if %newLast% leq %numTokens% for %%a in (_) do if "%%%token%" neq "" set "lastToken=%newLast%"
set /A newLast=lastToken+2
set "token=!$%newLast%!"
if %newLast% leq %numTokens% for %%a in (_) do if "%%%token%" neq "" set "lastToken=%newLast%"
set /A newLast=lastToken+1
set "token=!$%newLast%!"
if %newLast% leq %numTokens% for %%a in (_) do if "%%%token%" neq "" set "lastToken=%newLast%"

rem Show the tokens in current line
set /A line+=1
echo -------------------------------------------
echo Line # %line% contain %lastToken% tokens:
for %%a in (_) do (
   if %lastToken% gtr 10 echo Tokens  1-10: %tokens1-10%
   if %lastToken% gtr 20 echo Tokens 11-20: %tokens11-20%
)
if %lastToken% gtr 40 echo . . . . . . .
set /A "lim1=((lastToken-1)/10-1)*10+1, lim2=lim1+9, lim3=lim2+1"
if %lim1% geq 21 (
   call :ExpandTokensString "tokens=%lim1%-%lim2%" /V
   echo Tokens %lim1%-%lim2%: !tokens!
)
if %lim3% leq %lastToken% (
   call :ExpandTokensString "tokens=%lim3%-%lastToken%" /V
   echo Tokens %lim3%-%lastToken%: !tokens!
)
if %lastToken% equ %numTokens% for %%a in (_) do if "%restOfTokens%" neq "" (
   echo Additional tokens: %restOfTokens%
)

exit /B


:DefineForTokens

set /A "_tokens=numTokens+1, _pages=(_tokens/256+1)*2"
set "_hex= 0 1 2 3 4 5 6 7 8 9 A B C D E F"
call set "_pages=%%_hex:~0,%_pages%%%"
if %numTokens% gtr 2048 echo Creating FOR tokens variables, please wait . . .
(
   echo FF FE
   for %%P in (%_pages%) do for %%A in (%_hex%) do for %%B in (%_hex%) do echo %%A%%B 3%%P 0D 00 0A 00
) > "%temp%\forTokens.hex.txt"
certutil.exe -decodehex -f "%temp%\forTokens.hex.txt" "%temp%\forTokens.utf-16le.bom.txt" >NUL
for /F "tokens=2 delims=:." %%p in ('chcp') do set "_hex=%%p"
chcp 65001 >NUL
type "%temp%\forTokens.utf-16le.bom.txt" > "%temp%\forTokens.utf8.txt"
(for /L %%N in (0,1,%_tokens%) do set /P "$%%N=")  < "%temp%\forTokens.utf8.txt"
chcp %_hex% >NUL
del "%temp%\forTokens.*.txt"
for %%v in (_tokens _pages _hex) do set "%%v="
exit /B


:CreateNestedFors [/V]

setlocal EnableDelayedExpansion
set /A "mod=numTokens%%31, i=numTokens/31, lim=31, j=1"
if %mod% equ 0 set "mod=31"
set "NestedFors="
for /L %%i in (32,31,%numTokens%) do (
   if /I "%~1" equ "/V" (
      set "NestedFors=!NestedFors! if "%%!$%%i!"=="" (call:F !j!)else"
      set "j=%%i"
   )
   if !i! equ 1 set "lim=!mod!"
   set "NestedFors=!NestedFors! for /F "tokens=1-!lim!*" %%!$%%i! in ("%%!$%%i!")do"
   set /A "i-=1"
)
for /F "delims=" %%a in ("!NestedFors!") do endlocal & set "NestedFors=%%a"
exit /B


:ExpandTokensString "variable=tokensDefinition" [/Q] [/V]

setlocal EnableDelayedExpansion
set "var=" & set "tokens=" & set "length=0" & set "quote=" & set "values="
if /I "%~2" equ "/Q" shift /2 & set quote="
if /I "%~2" equ "/V" shift /2 & set "values=1"
if /I "%~2" equ "/Q" set quote="
for %%a in (%~1) do (
   if not defined var (
      set "var=%%a"
   ) else for /F "tokens=1-3 delims=-+" %%i in ("%%a") do (
      if "%%j" equ "" (
         set "tokens=!tokens! !quote!%%!$%%i!!quote!" & set /A length+=1
      ) else (
         if "%%k" equ "" (set "k=1") else set "k=%%k"
         if %%i leq %%j (
            for /L %%n in (%%i,!k!,%%j) do set "tokens=!tokens! !quote!%%!$%%n!!quote!" & set /A length+=1
         ) else (
            for /L %%n in (%%i,-!k!,%%j) do set "tokens=!tokens! !quote!%%!$%%n!!quote!" & set /A length+=1
         )
      )
   )
)
if defined values for %%a in (_) do set "tokens=%tokens%"
endlocal & set "%var%=%tokens:~1%" & set "%var%.length=%length%"
exit /B

And this is an output example of previous program:

Code: Select all

This program create a text file with the number of tokens per line given by the
user; then show the first 20 tokens and up to last 20 tokens of each line.

Enter number of tokens per line separated by spaces, i.e.: 100 1234 3820 1 40

The maximum number of tokens that can be processed in a line is 3813

===========================================

Tokens per line: 100 1234 3820 1 40
-------------------------------------------
Line # 1 contain 100 tokens:
Tokens  1-10: 1  2  3  4  5  6  7  8  9  10
Tokens 11-20: 11 12 13 14 15 16 17 18 19 20
. . . . . . .
Tokens 81-90: 81 82 83 84 85 86 87 88 89 90
Tokens 91-100: 91 92 93 94 95 96 97 98 99 100
-------------------------------------------
Line # 2 contain 1234 tokens:
Tokens  1-10: 1  2  3  4  5  6  7  8  9  10
Tokens 11-20: 11 12 13 14 15 16 17 18 19 20
. . . . . . .
Tokens 1221-1230: 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230
Tokens 1231-1234: 1231 1232 1233 1234
-------------------------------------------
Line # 3 contain 3813 tokens:
Tokens  1-10: 1  2  3  4  5  6  7  8  9  10
Tokens 11-20: 11 12 13 14 15 16 17 18 19 20
. . . . . . .
Tokens 3801-3810: 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810
Tokens 3811-3813: 3811 3812 3813
Additional tokens: 3814 3815 3816 3817 3818 3819 3820
-------------------------------------------
Line # 4 contain 1 tokens:
Tokens 1-1: 1
-------------------------------------------
Line # 5 contain 40 tokens:
Tokens  1-10: 1  2  3  4  5  6  7  8  9  10
Tokens 11-20: 11 12 13 14 15 16 17 18 19 20
Tokens 21-30: 21 22 23 24 25 26 27 28 29 30
Tokens 31-40: 31 32 33 34 35 36 37 38 39 40

NOTE: As I stated several times before, the maximum number of tokens that can be processed in a line depends on both the length of the remaining tokens placed after the 31th one and the total length of the command-line that process all such FOR /F commands and tokens. The previous example was developed specifically to get the maximum possible number of tokens in varying-length lines. This does not means that the same number of tokens could be correctly processed with any other data or with any other commands. You should always test the maximum number of tokens that can be processed with your particular data file.

Antonio

Post Reply