Using many "tokens=..." in FOR /F command in a simple way
Posted: 07 Feb 2017 12:40
EDIT: When this topic was created there was a problem with the FOR /F command: the characters in 128..255 extended range can not be used as FOR /F replaceable parameters in a standard way, so the first version of this application just worked with a maximum of 208 tokens and only when the active code page was 437. After a common effort from a group of people, that generated a series of tests and new concepts, a new method to access FOR /F replaceable parameters that have no limits in the number of tokens nor in the active code page was developed; you may read about the development of this method in the posts below. The second version of this application use the new method and can process up to 4094 tokens from the same line; you may review and download it at this post.
The purpose of this topic is assemble a method to use more tokens in a FOR /F command, as much as possible, but in a simple way. I read the description about this management in this SO post, but the standard method is complicated: it requires an Extended ASCII table to know the characters that must be used for each different token and there are several special cases that needs to be managed in a different way.
The Batch code below is an example of a general-use method that combine a series of chained FOR /F commands in a way that returns lines with many tokens, up to 208, from a text file; the way to specify the tokens is via a string similar to the original "tokens=x,y,m-n" one, but that allows token numbers up to 208. Another advantage is that the new tokens string allows to repeat tokens numbers or put a tokens range in descending order; the output will contain the same tokens specified in the tokens string. For example, using this text file as base:
... these are a couple examples of the program output:
The Batch file is in this .zip file:
You may also add the other options to the series of FOR /F commands, like usebackq, delims, etc. The only not supported option in the new tokens string is an asterisk at end to get the rest of tokens. This point could also be implemented, but it would required an additional processing of the result line that is obtained from each one of the file lines. The method used to extract the tokens is very efficient: after prepare an equivalent "tokensValues" string based on the original tokens one, it uses one CALL command to extract all tokens from each input line via a single SET command.
Important: When I started to do tests on this subject in my computer, the extended characters used as tokens just not worked. For example, in this command:
... the expected output is "A B C D E", but I got "A %í %ó %ú %ñ" instead and the same happened with any other successive characters taken from the extended set in 437 code page order: just the first token show the first value, the rest of tokens never got their values.
After many tests I discovered that the order of the extended characters in 128..255 range used as successive tokens in FOR /F command in my Windows 8.1 Spanish version was not the standard numerical order, but a very different order that I could establish with the aid of a program. The chain formed by these characters does not cover the full 128 characters in 128..255 range, but just a chain with 95 characters, two small chains with 2 characters each and the rest of characters remain isolated. For this reason, the Spanish version of this program can manage just a maximum of 177 tokens, instead of 208, and the following two sections must be changed in the original code:
This point means that I have NOT tested the code I posted at beginning, because I have not access to any PC with Windows English version right now. I assume that the program should work based on the comments given by jeb, Dave and npocmaka on the referred SO answer about that all these characters works (excepting 0xFF, that I don't use), but I still could made an error in the characters I used in the last four chained FOR /F commands. I'll appreciate it if someone could test the original code in any Windows English version and confirm if it works. I also suppose that the sequence of succesive characters will be different in Windows versions in other languages...
Antonio
The purpose of this topic is assemble a method to use more tokens in a FOR /F command, as much as possible, but in a simple way. I read the description about this management in this SO post, but the standard method is complicated: it requires an Extended ASCII table to know the characters that must be used for each different token and there are several special cases that needs to be managed in a different way.
The Batch code below is an example of a general-use method that combine a series of chained FOR /F commands in a way that returns lines with many tokens, up to 208, from a text file; the way to specify the tokens is via a string similar to the original "tokens=x,y,m-n" one, but that allows token numbers up to 208. Another advantage is that the new tokens string allows to repeat tokens numbers or put a tokens range in descending order; the output will contain the same tokens specified in the tokens string. For example, using this text file as base:
Code: Select all
A1 A2 A3 A4 ... A177 A178 A179 A180
B1 B2 B3 B4 ... B177 B178 B179 B180
C1 C2 C3 C4 ... C177 C178 C179 C180
... these are a couple examples of the program output:
Code: Select all
tokens=1,20,45,75,120
A1 A20 A45 A75 A120
B1 B20 B45 B75 B120
C1 C20 C45 C75 C120
tokens=30,28-32,170-165
A30 A28 A29 A30 A31 A32 A170 A169 A168 A167 A166 A165
B30 B28 B29 B30 B31 B32 B170 B169 B168 B167 B166 B165
C30 C28 C29 C30 C31 C32 C170 C169 C168 C167 C166 C165
The Batch file is in this .zip file:
You may also add the other options to the series of FOR /F commands, like usebackq, delims, etc. The only not supported option in the new tokens string is an asterisk at end to get the rest of tokens. This point could also be implemented, but it would required an additional processing of the result line that is obtained from each one of the file lines. The method used to extract the tokens is very efficient: after prepare an equivalent "tokensValues" string based on the original tokens one, it uses one CALL command to extract all tokens from each input line via a single SET command.
Important: When I started to do tests on this subject in my computer, the extended characters used as tokens just not worked. For example, in this command:
Code: Select all
for /F "tokens=1-5" %á in ("A B C D E") do echo "%á %í %ó %ú %ñ"
... the expected output is "A B C D E", but I got "A %í %ó %ú %ñ" instead and the same happened with any other successive characters taken from the extended set in 437 code page order: just the first token show the first value, the rest of tokens never got their values.
After many tests I discovered that the order of the extended characters in 128..255 range used as successive tokens in FOR /F command in my Windows 8.1 Spanish version was not the standard numerical order, but a very different order that I could establish with the aid of a program. The chain formed by these characters does not cover the full 128 characters in 128..255 range, but just a chain with 95 characters, two small chains with 2 characters each and the rest of characters remain isolated. For this reason, the Spanish version of this program can manage just a maximum of 177 tokens, instead of 208, and the following two sections must be changed in the original code:
Code: Select all
rem Create 95 characters for 3 FOR's with "tokens=1-31*"
rem This is the tokens sequence used in Windows 8.1 Spanish
set "i=0"
for %%i in (173 189 156 207 190 221 245 249 184 166 174 170 240 169 238 248
241 253 252 239 230 244 250 247 251 167 175 172 171 243 168 183
181 182 199 142 143 146 128 212 144 210 211 222 214 215 216 209
165 227 224 226 229 153 158 157 235 233 234 154 237 232 225 133
160 131 198 132 134 145 135 138 130 136 137 141 161 140 139 208
164 149 162 147 228 148 246 155 151 163 150 129 236 231 152 ) do (
set /A i+=1, mod=i%%32
if !mod! neq 0 (
call :genchr %%i
type %%i.chr
del %%i.chr
)
)
) > FOR-Fchars.txt
del t.tmp temp.tmp
set "options="
:readChars
set /P "char=" < FOR-Fchars.txt
set "lastToken=177"
Code: Select all
rem First three FOR's use as tokens the ASCII chars in 38..124 (&..|) range: 28*3 = 84 tokens + 3 tokens for next FOR
rem Next three FOR's use as tokens Extended chars: 31*3 = 93 tokens + 2 tokens for next FOR
rem based on the tokens sequence used in Windows 8.1 Spanish
rem Total: 177 tokens
for /F "eol= tokens=1-28*" %%^& in (test.txt) do ^
for /F "eol= tokens=1-28*" %%C in ("%%B") do ^
for /F "eol= tokens=1-28*" %%` in ("%%_") do ^
for /F "eol= tokens=1-31*" %% in ("%%|") do ^
for /F "eol= tokens=1-31*" %%µ in ("%%·") do ^
for /F "eol= tokens=1-31" %% in ("%%…") do (
call :getTokens result=
rem Process here the "result" string
echo !result!
)
goto nextSet
This point means that I have NOT tested the code I posted at beginning, because I have not access to any PC with Windows English version right now. I assume that the program should work based on the comments given by jeb, Dave and npocmaka on the referred SO answer about that all these characters works (excepting 0xFF, that I don't use), but I still could made an error in the characters I used in the last four chained FOR /F commands. I'll appreciate it if someone could test the original code in any Windows English version and confirm if it works. I also suppose that the sequence of succesive characters will be different in Windows versions in other languages...
Antonio