Using many "tokens=..." in FOR /F command in a simple way

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
aGerman
Expert
Posts: 4678
Joined: 22 Jan 2010 18:01
Location: Germany

Re: Using many "tokens=..." in FOR /F command in a simple way

#31 Post by aGerman » 23 Feb 2017 08:38

penpen wrote:according to the java documentation UTF-16 is a little bit "ugly" there.

Code: Select all

1.no surrogate in [0x0000 : 0xD7FF]
high surrogate in [0xD800 : 0xDBFF]
low  surrogate in [0xDC00 : 0xDFFF]
2.no surrogate in [0xE000 : 0xFFFF]
All non surrogate code units in [0xE000 : 0xFFFF] would be treated as low surrogates.

Thanks for pointing! Fortunately the comparisons I used in my utility are accordingly :)

penpen wrote:This part may be risky, if the the first hex value not equals "00";
I don't understand why :?
penpen wrote:also the last "00" may be unneeded.
True. But I only read the number of bytes that are determined from dummy.txt. Hence it won't cause an error.

However thanks for your corrections. It works great!

penpen wrote:If you want to do that for any other codepage, too, then we need to find out, how to list all character units in a codepage.

We should rather ask Antonio :wink:
For my understanding the tool should be working for the default OEM code page on a certain machine. The goal is to work around the FOR /F tokens limit. Besides of that I think it should behave the same.

Steffen

penpen
Expert
Posts: 2009
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Using many "tokens=..." in FOR /F command in a simple way

#32 Post by penpen » 23 Feb 2017 10:40

aGerman wrote:
penpen wrote:This part may be risky, if the the first hex value not equals "00";
I don't understand why :?
Sorry, i'd mixed up two points... (without finishing any...).
(I really should stop posting after 2:00am... .)

I wanted to write:
1) If the content of X not equals zero, then it might mess up the dump output (steals a "00" if it is too big, 'donates' some zeroes if it has a negative value).
2) If the first hex value equals "00", then the first Y might be 1 (or higher) and in that case you get an additional "00" (if %X% equals zero).


Sidenote:
I have tested your CONVERTCP utility, and read the source code:
I saw no error, but i noticed that your tool does more, than just converting between codepages - it also approximates characters that are not within the target codepage (which is not that bad, because cmd.exe is doing the same, but i would mention it somewhere).
For example i created a file "string.txt" with this content (i hope it is not corrupted) encoded using UTF-8:

Code: Select all

ĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩ

If you convert it to codepage 850 you get:

Code: Select all

AaAaAaCcCcCcCcDdDdEeEeEeEeEeGgGgGgGgHhHhIi

The recommended behaviour for such cases i know were to use the REPLACEMENT CHARACTER, a question mark, a square, or a question mark in a square for such cases.

When playing with this tool i got the idea, to create an UTF-8 file (because that is easy to build) that contains all valid codepoints, then convert this file to UTF-16LE/BE (because the characters are easier to track), and if you want to detect the mapping of a codepage, just use your tool, to convert to the target codepage and back. All characters that are the same in the utf-16 files are in that codepage.
The only odd thing is, that the UTF-8 (~4MB) file needs a while if using pure batch and merging single byte files; but we could create an exe (via c# or so) for that (from given source).


penpen

aGerman
Expert
Posts: 4678
Joined: 22 Jan 2010 18:01
Location: Germany

Re: Using many "tokens=..." in FOR /F command in a simple way

#33 Post by aGerman » 23 Feb 2017 11:14

penpen wrote:Sidenote:

I should rather answer that in the other thread.

Steffen

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Using many "tokens=..." in FOR /F command in a simple way

#34 Post by dbenham » 14 Mar 2017 21:55

Fascinating topic :D

I'm a bit late to join in. But hopefully better late than never.

Long ago I had verified that values 0x80-0xFE are all available to FOR /F on my machine, which happens to use code page 437. But I only confirmed individual characters, one at a time. I did not bother to look at the sequencing, or investigate the effect of changing the code page.

Here is a summary of my understanding of the discussion so far - nothing new here

Based on my understanding of what has been written, plus some experimentation on my own, it looks like Windows interprets characters based on the active code page, and stores the value internally as the UTF-16? or UTF-32? code point. When parsing FOR tokens, the tokens are stored in a (presumably 0 based) array, and the base character establishes the UTF-?? code point offset. The offset is subtracted from each FOR /F character to determine the corresponding index into the array of tokens.

Most code pages used on this forum interpret 0x00-0x7F as ASCII, and the UTF-?? code points are the same. So all characters in the range 0x01-0x7F are available to FOR /F (except for 0x0D because that can only be accessed via delayed expansion, which does no good for FOR /F).

But the mapping of high-order byte characters varies tremendously. Although nearly all those characters can be used individually, there are often gaps in the mapping, which limits their effectiveness.

Aacini developed a utility to discover contiguous ranges of high-order byte characters. However, it is a bit slow and tedious to use, and it does not show the relationship of threads to each other. But it was a critical contribution that helped lead to the theory as it stands now.

aGerman and penpen developed a fast utility to convert a given code page into a list of byte codes with corresponding UTF-16 code point, sorted by the UTF-16. This could be very useful in establishing contiguous ranges, assuming the theory is correct. But the results still need to be verified against actual FOR /F behavior. I find the final line that lists the characters in code point order to be a bit useless because it is impossible to see where there are gaps.

Here begins my contribution

First off, here is my computer info:

Code: Select all

--------------------------------------------------------------------------------
Windows version        :  Microsoft Windows [Version 10.0.14393]
Product name           :  Windows 10 Pro, 64 bit
Performance indicators :  Processor Cores: 4      Visible RAM: 4192432 kilobytes

Date/Time format       :  (mm/dd/yy)  Sun 03/12/2017  17:37:39.09
__APPDIR__             :  C:\WINDOWS\system32\
ComSpec                :  C:\WINDOWS\system32\cmd.exe
PathExt                :  .COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC
Extensions             :  system: Enabled   user: Enabled 
Delayed expansion      :  system: Disabled  user: Disabled
Locale name            :  en-US       Code Pages: OEM  437    ANSI 1252
DIR  format            :  03/08/2017  12:46 AM     7,812,395,008 pagefile.sys
Permissions            :  Elevated Admin=No, Admin group=Yes

                          Missing from the tool collection:  debug
I decided to write my own utilities to probe FOR /F behavior.

The first utility, probeFOR.bat, simply uses %1 to establish a base variable, represented as a hex value, and requests tokens 1-31*. It then attempts to echo the value of all values from 0x01-0xFF (except 0x0D), in order to discover which characters map to which tokens. I use FINDSTR and sort to get a sorted list of characters that map. Note that token [32] actually represents the remainder of the line after token 31.

It is most useful when there are characters that map to all 32 tokens. But even when there are gaps, it is still useful in building non-contiguous threads that will be useful in testing the UTF code point theory.

This utility will only work with single byte code pages that interpret 0x00-0x7F as ASCII.

The utility has lots of bytes that do not post well on the forum, so it is pointless to post the code. I've posted the zipped file instead:
probeFOR.zip
(1.52 KiB) Downloaded 1176 times
See http://stackoverflow.com/a/8520993/1012053 for a table that shows which characters can be used as a base point.

Example usage (code page 437):

Code: Select all

C:\test>probeFOR 01
\x01 = "[01]"
\x02 = "[02]"
\x03 = "[03]"
\x04 = "[04]"
\x05 = "[05]"
\x06 = "[06]"
\x07 = "[07]"
\x08 = "[08]"
\x09 = "[09]"
\x0A = "[10]"
\x0B = "[11]"
\x0C = "[12]"
\x0E = "[14]"
\x0F = "[15]"
\x10 = "[16]"
\x11 = "[17]"
\x12 = "[18]"
\x13 = "[19]"
\x14 = "[20]"
\x15 = "[21]"
\x16 = "[22]"
\x17 = "[23]"
\x18 = "[24]"
\x19 = "[25]"
\x1A = "[26]"
\x1B = "[27]"
\x1C = "[28]"
\x1D = "[29]"
\x1E = "[30]"
\x1F = "[31]"
\x20 = "[32]"
Note that 0x0D <CR>, token 13 is missing, as expected.

Some characters, like space, cannot be used as a starting point:

Code: Select all

C:\test>probeFOR 20
% was unexpected at this time.
Other characters like < must be escaped as ^<, so both characters must be passed as hex:

Code: Select all

C:\test>probeFOR 5E3C
\x3C = "[01]"
\x3D = "[02]"
\x3E = "[03]"
\x3F = "[04]"
\x40 = "[05]"
\x41 = "[06]"
\x42 = "[07]"
\x43 = "[08]"
\x44 = "[09]"
\x45 = "[10]"
\x46 = "[11]"
\x47 = "[12]"
\x48 = "[13]"
\x49 = "[14]"
\x4A = "[15]"
\x4B = "[16]"
\x4C = "[17]"
\x4D = "[18]"
\x4E = "[19]"
\x4F = "[20]"
\x50 = "[21]"
\x51 = "[22]"
\x52 = "[23]"
\x53 = "[24]"
\x54 = "[25]"
\x55 = "[26]"
\x56 = "[27]"
\x57 = "[28]"
\x58 = "[29]"
\x59 = "[30]"
\x5A = "[31]"
\x5B = "[32]"
I have confirmed that all characters in the range 0x01-0x7F map contiguously (except, of course, for 0x0D)

But beginning with 0x80, there may be gaps. Here are the results for code pages 437 and 850:

Code: Select all

C:\test>chcp
Active code page: 437

C:\test>probeFOR 80
\x80 = "[01]"
\x90 = "[03]"
\xA5 = "[11]"
\x99 = "[16]"
\x9A = "[22]"
\xE1 = "[25]"
\x85 = "[26]"
\xA0 = "[27]"
\x83 = "[28]"
\x84 = "[30]"
\x86 = "[31]"
\x91 = "[32]"

C:\test>chcp 850
Active code page: 850

C:\test>probeFOR 80
\x80 = "[01]"
\xD4 = "[02]"
\x90 = "[03]"
\xD2 = "[04]"
\xD3 = "[05]"
\xDE = "[06]"
\xD6 = "[07]"
\xD7 = "[08]"
\xD8 = "[09]"
\xD1 = "[10]"
\xA5 = "[11]"
\xE3 = "[12]"
\xE0 = "[13]"
\xE2 = "[14]"
\xE5 = "[15]"
\x99 = "[16]"
\x9E = "[17]"
\x9D = "[18]"
\xEB = "[19]"
\xE9 = "[20]"
\xEA = "[21]"
\x9A = "[22]"
\xED = "[23]"
\xE8 = "[24]"
\xE1 = "[25]"
\x85 = "[26]"
\xA0 = "[27]"
\x83 = "[28]"
\xC6 = "[29]"
\x84 = "[30]"
\x86 = "[31]"
\x91 = "[32]"
Using nothing but probeFOR.bat, I was able to tediously build a complete map for code page 850. I also added the UTF-16 code points to show that the results are consistent with the theory.

Code: Select all

CHCP 850           FF-00A0 (non-breaking space) is inaccessible
                                   
Rel |     T       H       R       E       A       D       S     |
Pos |    1    |    2    |    3    |    4    |    5    |    6    |
----+---------+---------+---------+---------+---------+---------+
  1 | 01-0001 | AD-00A1 | C4-2500 | 9F-0192 | D5-0131 | F2-2017 |
  2 | 02-0002 | BD-00A2 |    2501 | *       | *       | *       |
  3 | 03-0003 | 9C-00A3 | B3-2502 |         |         |         |
  4 | 04-0004 | CF-00A4 |    2503 |         |         |         |
  5 | 05-0005 | BE-00A5 |    2504 |         |         |         |
  6 | 06-0006 | DD-00A6 |    2505 |         |         |         |
  7 | 07-0007 | F5-00A7 |    2506 |         |         |         |
  8 | 08-0008 | F9-00A8 |    2507 |         |         |         |
  9 | 09-0009 | B8-00A9 |    2508 |         |         |         |
 10 | 0A-000A | A6-00AA |    2509 |         |         |         |
 11 | 0B-000B | AE-00AB |    250A |         |         |         |
 12 | 0C-000C | AA-00AC |    250B |         |         |         |
 13 |    000D | F0-00AD | DA-250C |         |         |         |
 14 | 0E-000E | A9-00AE |    250D |         |         |         |
 15 | 0F-000F | EE-00AF |    250E |         |         |         |
 16 | 10-0010 | F8-00B0 |    250F |         |         |         |
 17 | 11-0011 | F1-00B1 | BF-2510 |         |         |         |
 18 | 12-0012 | FD-00B2 |    2511 |         |         |         |
 19 | 13-0013 | FC-00B3 |    2512 |         |         |         |
 20 | 14-0014 | EF-00B4 |    2513 |         |         |         |
 21 | 15-0015 | E6-00B5 | C0-2514 |         |         |         |
 22 | 16-0016 | F4-00B6 |    2515 |         |         |         |
 23 | 17-0017 | FA-00B7 |    2516 |         |         |         |
 24 | 18-0018 | F7-00B8 |    2517 |         |         |         |
 25 | 19-0019 | FB-00B9 | D9-2518 |         |         |         |
 26 | 1A-001A | A7-00BA |    2519 |         |         |         |
 27 | 1B-001B | AF-00BB |    251A |         |         |         |
 28 | 1C-001C | AC-00BC |    251B |         |         |         |
 29 | 1D-001D | AB-00BD | C3-251C |         |         |         |
 30 | 1E-001E | F3-00BE |    251D |         |         |         |
 31 | 1F-001F | A8-00BF |    251E |         |         |         |
 32 | 20-0020 | B7-00C0 |    251F |         |         |         |
 33 | 21-0021 | B5-00C1 |    2520 |         |         |         |
 34 | 22-0022 | B6-00C2 |    2521 |         |         |         |
 35 | 23-0023 | C7-00C3 |    2522 |         |         |         |
 36 | 24-0024 | 8E-00C4 |    2523 |         |         |         |
 37 | 25-0025 | 8F-00C5 | B4-2524 |         |         |         |
 38 | 26-0026 | 92-00C6 |    2525 |         |         |         |
 39 | 27-0027 | 80-00C7 |    2526 |         |         |         |
 40 | 28-0028 | D4-00C8 |    2527 |         |         |         |
 41 | 29-0029 | 90-00C9 |    2528 |         |         |         |
 42 | 2A-002A | D2-00CA |    2529 |         |         |         |
 43 | 2B-002B | D3-00CB |    252A |         |         |         |
 44 | 2C-002C | DE-00CC |    252B |         |         |         |
 45 | 2D-002D | D6-00CD | C2-252C |         |         |         |
 46 | 2E-002E | D7-00CE |    252D |         |         |         |
 47 | 2F-002F | D8-00CF |    252E |         |         |         |
 48 | 30-0030 | D1-00D0 |    252F |         |         |         |
 49 | 31-0031 | A5-00D1 |    2530 |         |         |         |
 50 | 32-0032 | E3-00D2 |    2531 |         |         |         |
 51 | 33-0033 | E0-00D3 |    2532 |         |         |         |
 52 | 34-0034 | E2-00D4 |    2533 |         |         |         |
 53 | 35-0035 | E5-00D5 | C1-2534 |         |         |         |
 54 | 36-0036 | 99-00D6 |    2535 |         |         |         |
 55 | 37-0037 | 9E-00D7 |    2536 |         |         |         |
 56 | 38-0038 | 9D-00D8 |    2537 |         |         |         |
 57 | 39-0039 | EB-00D9 |    2538 |         |         |         |
 58 | 3A-003A | E9-00DA |    2539 |         |         |         |
 59 | 3B-003B | EA-00DB |    253A |         |         |         |
 60 | 3C-003C | 9A-00DC |    253B |         |         |         |
 61 | 3D-003D | ED-00DD | C5-253C |         |         |         |
 62 | 3E-003E | E8-00DE |    253D |         |         |         |
 63 | 3F-003F | E1-00DF |    253E |         |         |         |
 64 | 40-0040 | 85-00E0 |    253F |         |         |         |
 65 | 41-0041 | A0-00E1 |    2540 |         |         |         |
 66 | 42-0042 | 83-00E2 |    2541 |         |         |         |
 67 | 43-0043 | C6-00E3 |    2542 |         |         |         |
 68 | 44-0044 | 84-00E4 |    2543 |         |         |         |
 69 | 45-0045 | 86-00E5 |    2544 |         |         |         |
 70 | 46-0046 | 91-00E6 |    2545 |         |         |         |
 71 | 47-0047 | 87-00E7 |    2546 |         |         |         |
 72 | 48-0048 | 8A-00E8 |    2547 |         |         |         |
 73 | 49-0049 | 82-00E9 |    2548 |         |         |         |
 74 | 4A-004A | 88-00EA |    2549 |         |         |         |
 75 | 4B-004B | 89-00EB |    254A |         |         |         |
 76 | 4C-004C | 8D-00EC |    254B |         |         |         |
 77 | 4D-004D | A1-00ED |    254C |         |         |         |
 78 | 4E-004E | 8C-00EE |    254D |         |         |         |
 79 | 4F-004F | 8B-00EF |    254E |         |         |         |
 80 | 50-0050 | D0-00F0 |    254F |         |         |         |
 81 | 51-0051 | A4-00F1 | CD-2550 |         |         |         |
 82 | 52-0052 | 95-00F2 | BA-2551 |         |         |         |
 83 | 53-0053 | A2-00F3 |    2552 |         |         |         |
 84 | 54-0054 | 93-00F4 |    2553 |         |         |         |
 85 | 55-0055 | E4-00F5 | C9-2554 |         |         |         |
 86 | 56-0056 | 94-00F6 |    2555 |         |         |         |
 87 | 57-0057 | F6-00F7 |    2556 |         |         |         |
 88 | 58-0058 | 9B-00F8 | BB-2557 |         |         |         |
 89 | 59-0059 | 97-00F9 |    2558 |         |         |         |
 90 | 5A-005A | A3-00FA |    2559 |         |         |         |
 91 | 5B-005B | 96-00FB | C8-255A |         |         |         |
 92 | 5C-005C | 81-00FC |    255B |         |         |         |
 93 | 5D-005D | EC-00FD |    255C |         |         |         |
 94 | 5E-005E | E7-00FE | BC-255D |         |         |         |
 95 | 5F-005F | 98-00FF |    255E |         |         |         |
 96 | 60-0060 | *       |    255F |         |         |         |
 97 | 61-0061 |         | CC-2560 |         |         |         |
 98 | 62-0062 |         |    2561 |         |         |         |
 99 | 63-0063 |         |    2562 |         |         |         |
100 | 64-0064 |         | B9-2563 |         |         |         |
101 | 65-0065 |         |    2564 |         |         |         |
102 | 66-0066 |         |    2565 |         |         |         |
103 | 67-0067 |         | CB-2566 |         |         |         |
104 | 68-0068 |         |    2567 |         |         |         |
105 | 69-0069 |         |    2568 |         |         |         |
106 | 6A-006A |         | CA-2569 |         |         |         |
107 | 6B-006B |         |    256A |         |         |         |
108 | 6C-006C |         |    256B |         |         |         |
109 | 6D-006D |         | CE-256C |         |         |         |
110 | 6E-006E |         |    256D |         |         |         |
111 | 6F-006F |         |    256E |         |         |         |
112 | 70-0070 |         |    256F |         |         |         |
113 | 71-0071 |         |    2570 |         |         |         |
114 | 72-0072 |         |    2571 |         |         |         |
115 | 73-0073 |         |    2572 |         |         |         |
116 | 74-0074 |         |    2573 |         |         |         |
117 | 75-0075 |         |    2574 |         |         |         |
118 | 76-0076 |         |    2575 |         |         |         |
119 | 77-0077 |         |    2576 |         |         |         |
120 | 78-0078 |         |    2577 |         |         |         |
121 | 79-0079 |         |    2578 |         |         |         |
122 | 7A-007A |         |    2579 |         |         |         |
123 | 7B-007B |         |    257A |         |         |         |
124 | 7C-007C |         |    257B |         |         |         |
125 | 7D-007D |         |    257C |         |         |         |
126 | 7E-007E |         |    257D |         |         |         |
127 | 7F-007F |         |    257E |         |         |         |
128 | *       |         |    257F |         |         |         |
129 |         |         | DF-2580 |         |         |         |
130 |         |         |    2581 |         |         |         |
131 |         |         |    2582 |         |         |         |
132 |         |         |    2583 |         |         |         |
133 |         |         | DC-2584 |         |         |         |
134 |         |         |    2585 |         |         |         |
135 |         |         |    2586 |         |         |         |
136 |         |         |    2587 |         |         |         |
137 |         |         | DB-2588 |         |         |         |
138 |         |         |    2589 |         |         |         |
139 |         |         |    258A |         |         |         |
140 |         |         |    258B |         |         |         |
141 |         |         |    258C |         |         |         |
142 |         |         |    258D |         |         |         |
143 |         |         |    258E |         |         |         |
144 |         |         |    258F |         |         |         |
145 |         |         |    2590 |         |         |         |
146 |         |         | B0-2591 |         |         |         |
147 |         |         | B1-2592 |         |         |         |
148 |         |         | B2-2593 |         |         |         |
149 |         |         |    2594 |         |         |         |
150 |         |         |    2595 |         |         |         |
151 |         |         |    2596 |         |         |         |
152 |         |         |    2597 |         |         |         |
153 |         |         |    2598 |         |         |         |
154 |         |         |    2599 |         |         |         |
155 |         |         |    259A |         |         |         |
156 |         |         |    259B |         |         |         |
157 |         |         |    259C |         |         |         |
158 |         |         |    259D |         |         |         |
159 |         |         |    259E |         |         |         |
160 |         |         |    259F |         |         |         |
161 |         |         | FE-25A0 |         |         |         |
    |         |         | *       |         |         |         |
The next step was to write an efficient utility to map all of the high-order bytes in one step. CompileFOR.bat is dependent on probeFOR.bat. It writes diagnostic lines to stderr to show what steps are taken to compile the list. It then writes the final map to stdout, where it can be conveniently captured by redirection, if so desired.

compileFOR.bat

Code: Select all

@echo off
setlocal enableDelayedExpansion

:: Clear $ variables
for /f "delims==" %%A in ('set $ 2^>nul') do set "%%A="

:: Build list of high order bytes x80 - xFF
for %%A in (8 9 A B C D E F) do for %%B in (0 1 2 3 4 5 6 7 8 9 A B C D E F) do set "$x%%A%%B=1"

set /a minThread=curThread=101, maxThread=100"

:top
set "skip="
set "prev=1000"
for /f "delims=x= tokens=2" %%A in ('set $x 2^>nul') do (
  >&2 echo START %%A
  set /a "$T%curThread%.max=1000"
  set "char=%%A"
  call :buildThread && goto :top
)

:: Print Results
echo(
chcp
set "thread="
for /f "delims=$T.= tokens=1-3" %%A in ('set $T ^| findstr /lv ".max"') do (
  if "%%A" neq "!thread!" (
    set "thread=%%A"
    echo(
    echo Thread !thread:~-2!:
  )
  set "token=%%B"
  echo !token:~-3!=%%C
)
for /f "delims=$x=" %%A in ('set $x 2^>nul') do (
  if defined thread (
    echo(
    echo Inaccessible:
    set "thread="
  )
  echo %%A
)
exit /b


:buildThread
>&2 echo :buildThread  thread=%curThread%  char=%char%
(
  for /f "%skip% tokens=1,3 delims=\x=[] " %%A in ('probeFOR %char% 2^>nul') do (
    set /a "beg=$T%curThread%.max+1, end=beg+10%%B-prev-2, $T%curThread%.max=end+1, prev=10%%B"
    for /l %%N in (!beg! 1 !end!) do set "$T%curThread%.%%N=   "
    set "$T%curThread%.!$T%curThread%.max!=%%A"
    set "$x%%A="
    set "char=%%A"
    for /l %%N in (!minThread! 1 !maxThread!) do if !$T%%N.1001! == %%A (
      set /a "merge=%%N"
  	  goto :mergeThread
    )
  )
)
if %prev% gtr 1001 (
  set "skip=skip=1"
  set /a prev=1001
  goto :buildThread
)
if not defined $T%curThread%.1001 exit /b 1
set /a "maxThread=curThread, curThread=maxThread+1"
exit /b 0


:mergeThread
>&2 echo :mergeThread %curThread% %merge%
set /a "oldMax=$T%merge%.max, n=$T%merge%.max+=($T%curThread%.max-1001)"
for /l %%N in (!oldMax! -1 1001) do (
  set "$T%merge%.!n!=!$T%merge%.%%N!"
  set /a n-=1
)
for /l %%N in (1001 1 !$T%curThread%.max!) do set "$T%merge%.%%N=!$T%curThread%.%%N!"
for /f "delims==" %%A in ('set $T%curThread%.') do set "%%A="
exit /b 0
pause
And here is the result for for code page 437:

Code: Select all

C:\test>compileFOR
START 80
:buildThread  thread=101  char=80
:buildThread  thread=101  char=91
:buildThread  thread=101  char=98
START 8E
:buildThread  thread=102  char=8E
:mergeThread 102 101
START 9B
:buildThread  thread=102  char=9B
:buildThread  thread=102  char=A8
:mergeThread 102 101
START 9E
:buildThread  thread=102  char=9E
START 9F
:buildThread  thread=103  char=9F
START A9
:buildThread  thread=104  char=A9
:buildThread  thread=104  char=F5
START AD
:buildThread  thread=105  char=AD
:mergeThread 105 101
START B0
:buildThread  thread=105  char=B0
:buildThread  thread=105  char=FE
START B3
:buildThread  thread=106  char=B3
:buildThread  thread=106  char=C3
:buildThread  thread=106  char=C1
:buildThread  thread=106  char=D6
:buildThread  thread=106  char=CE
:buildThread  thread=106  char=DB
:mergeThread 106 105
START C4
:buildThread  thread=106  char=C4
:mergeThread 106 105
START E0
:buildThread  thread=106  char=E0
:buildThread  thread=106  char=ED
START E2
:buildThread  thread=107  char=E2
:mergeThread 107 106
START EC
:buildThread  thread=107  char=EC
:buildThread  thread=107  char=EF
:buildThread  thread=107  char=F7
:buildThread  thread=107  char=F2
START F9
:buildThread  thread=108  char=F9
:mergeThread 108 107
START FC
:buildThread  thread=108  char=FC
START FF
:buildThread  thread=109  char=FF

Active code page: 437

Thread 01:
001=AD
002=9B
003=9C
004=
005=9D
006=
007=
008=
009=
010=A6
011=AE
012=AA
013=
014=
015=
016=F8
017=F1
018=FD
019=
020=
021=E6
022=
023=FA
024=
025=
026=A7
027=AF
028=AC
029=AB
030=
031=A8
032=
033=
034=
035=
036=8E
037=8F
038=92
039=80
040=
041=90
042=
043=
044=
045=
046=
047=
048=
049=A5
050=
051=
052=
053=
054=99
055=
056=
057=
058=
059=
060=9A
061=
062=
063=E1
064=85
065=A0
066=83
067=
068=84
069=86
070=91
071=87
072=8A
073=82
074=88
075=89
076=8D
077=A1
078=8C
079=8B
080=
081=A4
082=95
083=A2
084=93
085=
086=94
087=F6
088=
089=97
090=A3
091=96
092=81
093=
094=
095=98

Thread 02:
001=9E

Thread 03:
001=9F

Thread 04:
001=A9
002=
003=
004=
005=
006=
007=
008=
009=
010=
011=
012=
013=
014=
015=
016=
017=F4
018=F5

Thread 05:
001=C4
002=
003=B3
004=
005=
006=
007=
008=
009=
010=
011=
012=
013=DA
014=
015=
016=
017=BF
018=
019=
020=
021=C0
022=
023=
024=
025=D9
026=
027=
028=
029=C3
030=
031=
032=
033=
034=
035=
036=
037=B4
038=
039=
040=
041=
042=
043=
044=
045=C2
046=
047=
048=
049=
050=
051=
052=
053=C1
054=
055=
056=
057=
058=
059=
060=
061=C5
062=
063=
064=
065=
066=
067=
068=
069=
070=
071=
072=
073=
074=
075=
076=
077=
078=
079=
080=
081=CD
082=BA
083=D5
084=D6
085=C9
086=B8
087=B7
088=BB
089=D4
090=D3
091=C8
092=BE
093=BD
094=BC
095=C6
096=C7
097=CC
098=B5
099=B6
100=B9
101=D1
102=D2
103=CB
104=CF
105=D0
106=CA
107=D8
108=D7
109=CE
110=
111=
112=
113=
114=
115=
116=
117=
118=
119=
120=
121=
122=
123=
124=
125=
126=
127=
128=
129=DF
130=
131=
132=
133=DC
134=
135=
136=
137=DB
138=
139=
140=
141=DD
142=
143=
144=
145=DE
146=B0
147=B1
148=B2
149=
150=
151=
152=
153=
154=
155=
156=
157=
158=
159=
160=
161=FE

Thread 06:
001=E2
002=
003=
004=
005=
006=E9
007=
008=
009=
010=
011=
012=
013=
014=
015=
016=
017=E4
018=
019=
020=E8
021=
022=
023=EA
024=
025=
026=
027=
028=
029=
030=
031=E0
032=
033=
034=EB
035=EE
036=
037=
038=
039=
040=
041=
042=
043=
044=
045=
046=E3
047=
048=
049=E5
050=E7
051=
052=ED

Thread 07:
001=F9
002=FB
003=
004=
005=
006=EC
007=
008=
009=
010=
011=
012=
013=
014=
015=
016=
017=EF
018=
019=
020=
021=
022=
023=
024=
025=
026=
027=
028=
029=
030=
031=
032=
033=
034=
035=
036=
037=
038=
039=
040=
041=
042=
043=
044=
045=
046=
047=
048=F7
049=
050=
051=
052=
053=
054=
055=
056=
057=
058=
059=
060=
061=
062=
063=
064=
065=
066=
067=
068=
069=
070=
071=
072=
073=F0
074=
075=
076=F3
077=F2

Thread 08:
001=FC

Inaccessible:
FF
I've done some minimal spot checking, but I have not fully added the UTF-16 code points to help verify the theory.

One last interesting tidbit
Code page 28591 (ISO/IEC 8859-1) could be really useful with FOR /F - It has all characters for many western European languages, and nearly complete for many more. But more importantly, all characters from 0x01-0xFF are mapped contiguously by FOR /F :!: :D

Even 0xFF is accessible. Only 0x0D cannot be used. So with careful construction, it should be possible to access up to 254 tokens simultaneously when using code page 28591.


Dave Benham

jeb
Expert
Posts: 1055
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

Re: Using many "tokens=..." in FOR /F command in a simple way

#35 Post by jeb » 15 Mar 2017 06:52

Hi Dave,

very nice and exhaustive examination of the parameters.

I have only one small suggestion:

Add a caret before the %char% to avoid problems when you begin with 22 or 5E (" ^)

Code: Select all

for %%%% in ("") do for /f "tokens=1-31*" %%^%char% in (


Currently I'm trying to access the 0x0D (CR) token, but it seems to be a bit tricky :)

jeb

fugitive
Posts: 19
Joined: 09 Mar 2017 02:26

Re: Using many "tokens=..." in FOR /F command in a simple way

#36 Post by fugitive » 15 Mar 2017 07:25

Hi.Aacini
I am very interested in your question.So I wrote some code as follow
But my English is not very good, so there might be some misunderstanding about your problem, please forgive me

Code: Select all

@echo off
setlocal enabledelayedexpansion

set /p tok=tokens=:
for /f %%a in ('findstr .* test.txt^|find /v /c ""') do (
   (for /l %%b in (1 1 %%a) do set/p .line%%b=)<test.txt
   for /f "tokens=2 delims==" %%c in ('set .line') do (
      set /a n=0
      call :loop "%%c"
   )
)      
pause

:loop
for /f "tokens=1* delims= " %%i in ("%~1") do (
   set /a n+=1
   if not !n! equ %tok% (
      call :loop "%%j"
   ) else (
      echo;%%i
   )
)
goto :eof


Fugitive

penpen
Expert
Posts: 2009
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Using many "tokens=..." in FOR /F command in a simple way

#37 Post by penpen » 15 Mar 2017 13:41

I just thought about using utf8 to access the variables:

Code: Select all

@echo off
setlocal enableExtensions enableDelayedExpansion
for /f "tokens=2 delims=:." %%a in ('chcp') do set "cp=%%~a"
set "UTF-16BE=1201"


:: "variables.hex"
if not exist "variables.hex.txt" (
   setlocal
   set "hex= 0 1 2 3 4 5 6 7 8 9 A B C D E F"
   set "line="
   echo(FF FE
   echo(23 00
   for %%b in (!hex!) do for %%a in (!hex!) do echo(%%~b%%~a 01
   echo(23 00
   endlocal
) >"variables.hex.txt"

:: "variables.txt"
if not exist "variables.utf-16le.bom.txt" if exist "variables.hex.txt" (
   >nul certutil.exe -decodehex -f "variables.hex.txt" "variables.utf-16le.bom.txt"
)

>nul chcp 65001
if not exist "variables.utf8.txt" (
   >"variables.utf8.txt" type "variables.utf-16le.bom.txt"
)
<"variables.utf8.txt" set /p "variables="
set "variables=!variables:~1,-1!"

for /l %%a in (0, 9, 255) do (
   set "v=!variables:~%%~a,10!          "
   set "v=!v:~0,10!"
   call :test
)

>nul chcp %cp%

endlocal
goto :eof



:test
<nul set /p "=Testing "!v!":"
for /f "tokens=1-10" %%%v:~0,1% in ("@0 @1 @2 @3 @4 @5 @6 @7 @8 @9") do (
   echo( %%~%v:~0,1% %%~%v:~1,1% %%~%v:~2,1% %%~%v:~3,1% %%~%v:~4,1% %%~%v:~5,1% %%~%v:~6,1% %%~%v:~7,1% %%~%v:~8,1% %%~%v:~9,1%
)

goto :eof
One could also use this to check the order of any characters for variable names.

Sample Output:

Code: Select all

Z:\>test.bat
Testing "ĀāĂ㥹ĆćĈĉ": @0 @1 @2 @3 @4 @5 @6 @7 @8 @9
Testing "ĉĊċČčĎďĐđĒ": @0 @1 @2 @3 @4 @5 @6 @7 @8 @9
Testing "ĒēĔĕĖėĘęĚě": @0 @1 @2 @3 @4 @5 @6 @7 @8 @9
Testing "ěĜĝĞğĠġĢģĤ": @0 @1 @2 @3 @4 @5 @6 @7 @8 @9
Testing "ĤĥĦħĨĩĪīĬĭ": @0 @1 @2 @3 @4 @5 @6 @7 @8 @9
Testing "ĭĮįİıIJijĴĵĶ": @0 @1 @2 @3 @4 @5 @6 @7 @8 @9
Testing "ĶķĸĹĺĻļĽľĿ": @0 @1 @2 @3 @4 @5 @6 @7 @8 @9
Testing "ĿŀŁłŃńŅņŇň": @0 @1 @2 @3 @4 @5 @6 @7 @8 @9
Testing "ňʼnŊŋŌōŎŏŐő": @0 @1 @2 @3 @4 @5 @6 @7 @8 @9
Testing "őŒœŔŕŖŗŘřŚ": @0 @1 @2 @3 @4 @5 @6 @7 @8 @9
Testing "ŚśŜŝŞşŠšŢţ": @0 @1 @2 @3 @4 @5 @6 @7 @8 @9
Testing "ţŤťŦŧŨũŪūŬ": @0 @1 @2 @3 @4 @5 @6 @7 @8 @9
Testing "ŬŭŮůŰűŲųŴŵ": @0 @1 @2 @3 @4 @5 @6 @7 @8 @9
Testing "ŵŶŷŸŹźŻżŽž": @0 @1 @2 @3 @4 @5 @6 @7 @8 @9
Testing "žſƀƁƂƃƄƅƆƇ": @0 @1 @2 @3 @4 @5 @6 @7 @8 @9
Testing "ƇƈƉƊƋƌƍƎƏƐ": @0 @1 @2 @3 @4 @5 @6 @7 @8 @9
Testing "ƐƑƒƓƔƕƖƗƘƙ": @0 @1 @2 @3 @4 @5 @6 @7 @8 @9
Testing "ƙƚƛƜƝƞƟƠơƢ": @0 @1 @2 @3 @4 @5 @6 @7 @8 @9
Testing "ƢƣƤƥƦƧƨƩƪƫ": @0 @1 @2 @3 @4 @5 @6 @7 @8 @9
Testing "ƫƬƭƮƯưƱƲƳƴ": @0 @1 @2 @3 @4 @5 @6 @7 @8 @9
Testing "ƴƵƶƷƸƹƺƻƼƽ": @0 @1 @2 @3 @4 @5 @6 @7 @8 @9
Testing "ƽƾƿǀǁǂǃDŽDždž": @0 @1 @2 @3 @4 @5 @6 @7 @8 @9
Testing "džLJLjljNJNjnjǍǎǏ": @0 @1 @2 @3 @4 @5 @6 @7 @8 @9
Testing "ǏǐǑǒǓǔǕǖǗǘ": @0 @1 @2 @3 @4 @5 @6 @7 @8 @9
Testing "ǘǙǚǛǜǝǞǟǠǡ": @0 @1 @2 @3 @4 @5 @6 @7 @8 @9
Testing "ǡǢǣǤǥǦǧǨǩǪ": @0 @1 @2 @3 @4 @5 @6 @7 @8 @9
Testing "ǪǫǬǭǮǯǰDZDzdz": @0 @1 @2 @3 @4 @5 @6 @7 @8 @9
Testing "dzǴǵǶǷǸǹǺǻǼ": @0 @1 @2 @3 @4 @5 @6 @7 @8 @9
Testing "ǼǽǾǿ      ": @0 @1 @2 @3 %~  %~  %~  %~  %~  %~

Z:\>


penpen

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Using many "tokens=..." in FOR /F command in a simple way

#38 Post by dbenham » 17 Mar 2017 16:14

jeb wrote:Currently I'm trying to access the 0x0D (CR) token, but it seems to be a bit tricky :)
Good luck doing that from batch :!: :lol:

But it simply works from the command line, meaning your phase 1.5 applies only to batch, not the command line.

I use [Alt][1][3] to enter CR from the command line, and it shows as a musical eighth note. But it actually contains CR, as evidenced by the results:

Code: Select all

D:\test>for %♪ in (end) do @echo        begin ♪ %♪
 end   begin

D:\test>set "CR=♪"

D:\test>for %%CR% in (end) do @echo        begin %CR% %%CR%
 end   begin

UPDATE - The above was run from my Windows 8 machine at work. When I run the tests from my Windows 10 machine at home, I get different results. The direct entered value stays an eighth note (never functions as a CR). And when I get CR into a variable, then it is stripped when I expand with %CR%. But if I change my console properties to use the legacy console, then I can use the CR as above. :shock: I don't understand how the new console can possibly affect the cmd.exe parser phase 1.5 :?

jeb wrote:Add a caret before the %char% to avoid problems when you begin with 22 or 5E (" ^)

Code: Select all

for %%%% in ("") do for /f "tokens=1-31*" %%^%char% in (


Of course - good idea. The caret is required for a few characters, but it never hurts to add it when it is not needed. That change certainly makes the utility easier to use.

I've also modified probeFOR.bat to accept a 2nd optional argument that specifies how many tokens to read. The value should be between 1 and 31. If not specified, then it defaults to 31*
probeFOR.zip
(1.59 KiB) Downloaded 1175 times

I've also modified compileFOR.bat to discover all threads for characters 0x01 - 0xFF. This will only work for single byte code pages that treat bytes 0x01 - 0x7F as ASCII. Again, this script is dependent on probeFOR.bat.

compileFOR.bat

Code: Select all

@echo off
setlocal enableDelayedExpansion

:: Clear $ variables
for /f "delims==" %%A in ('set $ 2^>nul') do set "%%A="

:: Build list of bytes 0x01 - 0xFF
for %%A in (0 1 2 3 4 5 6 7 8 9 A B C D E F) do for %%B in (0 1 2 3 4 5 6 7 8 9 A B C D E F) do set "$x%%A%%B=1"
set "$x00="

set /a minThread=curThread=101, maxThread=100, cnt=29"

:top
set "skip="
set "prev=1000"
for /f "delims=x= tokens=2" %%A in ('set $x 2^>nul') do (
  >&2 echo START %%A
  set /a "$T%curThread%.max=1000"
  set "char=%%A"
  call :buildThread && (
    set "cnt="
    goto :top
  ) || (
    set "$i%%A=1"
    set "$x%%A="
  )
)

:: Print Results
echo(
chcp
set "thread="
for /f "delims=$T.= tokens=1-3" %%A in ('set $T ^| findstr /lv ".max"') do (
  if "%%A" neq "!thread!" (
    set "thread=%%A"
    echo(
    echo Thread !thread:~-2!:
  )
  set "token=%%B"
  echo !token:~-3!=%%C
)
for /f "delims=$i=" %%A in ('set $i 2^>nul') do (
  if defined thread (
    echo(
    echo Inaccessible:
    set "thread="
  )
  echo %%A
)
exit /b


:buildThread
>&2 echo :buildThread  thread=%curThread:~-2%  char=%char%
(
  for /f "%skip% tokens=1,3 delims=\x=[] " %%A in ('probeFOR %char% %cnt% 2^>nul') do (
    set /a "beg=$T%curThread%.max+1, end=beg+10%%B-prev-2, $T%curThread%.max=end+1, prev=10%%B"
    for /l %%N in (!beg! 1 !end!) do set "$T%curThread%.%%N=   "
    set "$T%curThread%.!$T%curThread%.max!=%%A"
    set "$x%%A="
    set "char=%%A"
    for /l %%N in (!minThread! 1 !maxThread!) do if !$T%%N.1001! == %%A (
      set /a "merge=%%N"
       goto :mergeThread
    )
  )
)
if %prev% gtr 1001 (
  set "skip=skip=1"
  set /a prev=1001
  goto :buildThread
)
if not defined $T%curThread%.1001 exit /b 1
set /a "maxThread=curThread, curThread=maxThread+1"
exit /b 0


:mergeThread
>&2 echo :mergeThread %curThread:~-2% %merge:~-2%
set /a "oldMax=$T%merge%.max, n=$T%merge%.max+=($T%curThread%.max-1001)"
for /l %%N in (!oldMax! -1 1001) do (
  set "$T%merge%.!n!=!$T%merge%.%%N!"
  set /a n-=1
)
for /l %%N in (1001 1 !$T%curThread%.max!) do set "$T%merge%.%%N=!$T%curThread%.%%N!"
for /f "delims==" %%A in ('set $T%curThread%.') do set "%%A="
exit /b 0


Dave Benham

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Using many "tokens=..." in FOR /F command in a simple way

#39 Post by dbenham » 17 Mar 2017 17:01

penpen wrote:I just thought about using utf8 to access the variables:

Code: Select all

...


@penpen - Brilliant :!:

I ran a quick test, and you can restore the code page once you have loaded the unicode strings into the environment variables - the active code page has no effect on the variable expansion. There is no need to use code page 65001 during the FOR /F processing. This opens up an entire world of possibilities. We should be able to access many hundreds of tokens simultaneously, up to the limit of the 8191 byte line limit. :D

Dave Benham

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Using many "tokens=..." in FOR /F command in a simple way

#40 Post by dbenham » 17 Mar 2017 23:22

Here is a proof of concept that demonstrates simultaneous access to 300 FOR /F tokens. I use a variant of penpen's technique to define the variables used by FOR /F.

I've created a useful :DefineForChars routine to define the characters. Documentation is embedded within the routine.

Code: Select all

@echo off
setlocal enableDelayedExpansion
call :DefineForChars 2

set "ln="
for /l %%N in (1 1 300) do set "ln=!ln! {%%N}"

for /f                "tokens=1-30*"   %%%$1% in (
    "%ln%") do for /f "tokens=1-30*"  %%%$31% in (
 "%%%$31%") do for /f "tokens=1-30*"  %%%$61% in (
 "%%%$61%") do for /f "tokens=1-30*"  %%%$91% in (
 "%%%$91%") do for /f "tokens=1-30*" %%%$121% in (
"%%%$121%") do for /f "tokens=1-30*" %%%$151% in (
"%%%$151%") do for /f "tokens=1-30*" %%%$181% in (
"%%%$181%") do for /f "tokens=1-30*" %%%$211% in (
"%%%$211%") do for /f "tokens=1-30*" %%%$241% in (
"%%%$241%") do for /f "tokens=1-30*" %%%$271% in (
"%%%$271%") do (
  echo   %%%$1%   %%%$2%   %%%$3%   %%%$4%   %%%$5%   %%%$6%   %%%$7%   %%%$8%   %%%$9%  %%%$10%
  echo  %%%$11%  %%%$12%  %%%$13%  %%%$14%  %%%$15%  %%%$16%  %%%$17%  %%%$18%  %%%$19%  %%%$20%
  echo  %%%$21%  %%%$22%  %%%$23%  %%%$24%  %%%$25%  %%%$26%  %%%$27%  %%%$28%  %%%$29%  %%%$30%
  echo  %%%$31%  %%%$32%  %%%$33%  %%%$34%  %%%$35%  %%%$36%  %%%$37%  %%%$38%  %%%$39%  %%%$40%
  echo  %%%$41%  %%%$42%  %%%$43%  %%%$44%  %%%$45%  %%%$46%  %%%$47%  %%%$48%  %%%$49%  %%%$50%
  echo  %%%$51%  %%%$52%  %%%$53%  %%%$54%  %%%$55%  %%%$56%  %%%$57%  %%%$58%  %%%$59%  %%%$60%
  echo  %%%$61%  %%%$62%  %%%$63%  %%%$64%  %%%$65%  %%%$66%  %%%$67%  %%%$68%  %%%$69%  %%%$70%
  echo  %%%$71%  %%%$72%  %%%$73%  %%%$74%  %%%$75%  %%%$76%  %%%$77%  %%%$78%  %%%$79%  %%%$80%
  echo  %%%$81%  %%%$82%  %%%$83%  %%%$84%  %%%$85%  %%%$86%  %%%$87%  %%%$88%  %%%$89%  %%%$90%
  echo  %%%$91%  %%%$92%  %%%$93%  %%%$94%  %%%$95%  %%%$96%  %%%$97%  %%%$98%  %%%$99% %%%$100%

  echo %%%$101% %%%$102% %%%$103% %%%$104% %%%$105% %%%$106% %%%$107% %%%$108% %%%$109% %%%$120%
  echo %%%$111% %%%$112% %%%$113% %%%$114% %%%$115% %%%$116% %%%$117% %%%$118% %%%$119% %%%$120%
  echo %%%$121% %%%$122% %%%$123% %%%$124% %%%$125% %%%$126% %%%$127% %%%$128% %%%$129% %%%$130%
  echo %%%$131% %%%$132% %%%$133% %%%$134% %%%$135% %%%$136% %%%$137% %%%$138% %%%$139% %%%$140%
  echo %%%$141% %%%$142% %%%$143% %%%$144% %%%$145% %%%$146% %%%$147% %%%$148% %%%$149% %%%$150%
  echo %%%$151% %%%$152% %%%$153% %%%$154% %%%$155% %%%$156% %%%$157% %%%$158% %%%$159% %%%$160%
  echo %%%$161% %%%$162% %%%$163% %%%$164% %%%$165% %%%$166% %%%$167% %%%$168% %%%$169% %%%$170%
  echo %%%$171% %%%$172% %%%$173% %%%$174% %%%$175% %%%$176% %%%$177% %%%$178% %%%$179% %%%$180%
  echo %%%$181% %%%$182% %%%$183% %%%$184% %%%$185% %%%$186% %%%$187% %%%$188% %%%$189% %%%$190%
  echo %%%$191% %%%$192% %%%$193% %%%$194% %%%$195% %%%$196% %%%$197% %%%$198% %%%$199% %%%$200%

  echo %%%$201% %%%$202% %%%$203% %%%$204% %%%$205% %%%$206% %%%$207% %%%$208% %%%$209% %%%$220%
  echo %%%$211% %%%$212% %%%$213% %%%$214% %%%$215% %%%$216% %%%$217% %%%$218% %%%$219% %%%$220%
  echo %%%$221% %%%$222% %%%$223% %%%$224% %%%$225% %%%$226% %%%$227% %%%$228% %%%$229% %%%$230%
  echo %%%$231% %%%$232% %%%$233% %%%$234% %%%$235% %%%$236% %%%$237% %%%$238% %%%$239% %%%$240%
  echo %%%$241% %%%$242% %%%$243% %%%$244% %%%$245% %%%$246% %%%$247% %%%$248% %%%$249% %%%$250%
  echo %%%$251% %%%$252% %%%$253% %%%$254% %%%$255% %%%$256% %%%$257% %%%$258% %%%$259% %%%$260%
  echo %%%$261% %%%$262% %%%$263% %%%$264% %%%$265% %%%$266% %%%$267% %%%$268% %%%$269% %%%$270%
  echo %%%$271% %%%$272% %%%$273% %%%$274% %%%$275% %%%$276% %%%$277% %%%$278% %%%$279% %%%$280%
  echo %%%$281% %%%$282% %%%$283% %%%$284% %%%$285% %%%$286% %%%$287% %%%$288% %%%$289% %%%$290%
  echo %%%$291% %%%$292% %%%$293% %%%$294% %%%$295% %%%$296% %%%$297% %%%$298% %%%$299% %%%$300%
)
exit /b


:DefineForChars  Count
::
:: Defines variables to be used as FOR /F tokens, from $1 to $n, where n = Count*256
:: Also defines $max = Count*256.
:: No other variables are defined or tampered with.
:: Count must be a value between 1 and 9 (inclusive).
::
:: Once defined, the variables are very useful for parsing lines with many tokens, as
:: the values are guaranteed to be contiguous within the FOR /F mapping scheme.
::
:: For example, you can use $1 as a FOR variable by using %%%$1%.
::
::   FOR /F "TOKENS=1-31" %%%$1% IN (....) DO ...
::
::      %%%$1% = token 1, %%%$2% = token 2, ... %%%$31% = token 31
::
:: This routine never uses SETLOCAL, and works regardless whether delayed expansion
:: is enabled or disabled.
::
:: Three temporary files are created and deleted in the %TEMP% folder, and the active
:: code page is temporarily set to 65001, and then restored to the starting value
:: before returning. Once defined, the $n variables can be used with any code page.
::
for /f "tokens=2 delims=:." %%P in ('chcp') do call :DefineForCharsInternal %1
exit /b
:DefineForCharsInternal
set /a $max=%1*256
>"%temp%\forVariables.%~1.hex.txt" (
  echo FF FE
  for %%H in (
    "0 1 2 3 4 5 6 7 8 9 A B C D E F"
  ) do for /l %%N in (1 1 %~1) do for %%A in (%%~H) do for %%B in (%%~H) do (
    echo %%A%%B 0%%N 0D 00 0A 00
  )
)
>nul certutil.exe -decodehex -f "%temp%\forVariables.%~1.hex.txt" "%temp%\forVariables.%~1.utf-16le.bom.txt"
>nul chcp 65001
>"%temp%\forVariables.%~1.utf8.txt" type "%temp%\forVariables.%~1.utf-16le.bom.txt"
<"%temp%\forVariables.%~1.utf8.txt" (for /l %%N in (1 1 %$max%) do set /p "$%%N=")
for %%. in (dummy) do >nul chcp %%P 
del "%temp%\forVariables.%~1.*.txt"
exit /b


The input line looks like " {1} {2} {3} ... {300}"

And here is the output that demonstrates success:

Code: Select all

  {1}   {2}   {3}   {4}   {5}   {6}   {7}   {8}   {9}  {10}
 {11}  {12}  {13}  {14}  {15}  {16}  {17}  {18}  {19}  {20}
 {21}  {22}  {23}  {24}  {25}  {26}  {27}  {28}  {29}  {30}
 {31}  {32}  {33}  {34}  {35}  {36}  {37}  {38}  {39}  {40}
 {41}  {42}  {43}  {44}  {45}  {46}  {47}  {48}  {49}  {50}
 {51}  {52}  {53}  {54}  {55}  {56}  {57}  {58}  {59}  {60}
 {61}  {62}  {63}  {64}  {65}  {66}  {67}  {68}  {69}  {70}
 {71}  {72}  {73}  {74}  {75}  {76}  {77}  {78}  {79}  {80}
 {81}  {82}  {83}  {84}  {85}  {86}  {87}  {88}  {89}  {90}
 {91}  {92}  {93}  {94}  {95}  {96}  {97}  {98}  {99} {100}
{101} {102} {103} {104} {105} {106} {107} {108} {109} {120}
{111} {112} {113} {114} {115} {116} {117} {118} {119} {120}
{121} {122} {123} {124} {125} {126} {127} {128} {129} {130}
{131} {132} {133} {134} {135} {136} {137} {138} {139} {140}
{141} {142} {143} {144} {145} {146} {147} {148} {149} {150}
{151} {152} {153} {154} {155} {156} {157} {158} {159} {160}
{161} {162} {163} {164} {165} {166} {167} {168} {169} {170}
{171} {172} {173} {174} {175} {176} {177} {178} {179} {180}
{181} {182} {183} {184} {185} {186} {187} {188} {189} {190}
{191} {192} {193} {194} {195} {196} {197} {198} {199} {200}
{201} {202} {203} {204} {205} {206} {207} {208} {209} {220}
{211} {212} {213} {214} {215} {216} {217} {218} {219} {220}
{221} {222} {223} {224} {225} {226} {227} {228} {229} {230}
{231} {232} {233} {234} {235} {236} {237} {238} {239} {240}
{241} {242} {243} {244} {245} {246} {247} {248} {249} {250}
{251} {252} {253} {254} {255} {256} {257} {258} {259} {260}
{261} {262} {263} {264} {265} {266} {267} {268} {269} {270}
{271} {272} {273} {274} {275} {276} {277} {278} {279} {280}
{281} {282} {283} {284} {285} {286} {287} {288} {289} {290}
{291} {292} {293} {294} {295} {296} {297} {298} {299} {300}


The next step is to write a routine that will build a FOR /F macro consisting of a series of FOR /F statements, with the last one ending at DO.

The calling sequence would be something like the following:

Code: Select all

call :defineFor  150  inClause  forMacroName   [/s skipCount]  [/d "delims"]  [/e eolCharacter]  [/u] 
%forMacro% (
  echo token 1 = %%%$1%
  echo token 2 = %%%$2%
  echo token 3 = %%%$3%
  ...
  echo token 150 = %%%$150%
)


Dave Benham

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Using many "tokens=..." in FOR /F command in a simple way

#41 Post by dbenham » 18 Mar 2017 08:03

I've refined my concept of the :defineFor macro creator. I've decided to let the user have complete control of the initial FOR /F loop that would load each unparsed ("delims=") line into a normal FOR variable. The user can deal with EOL, SKIP, USEBACKQ normally, and the macro need only ever work with strings and worry about the delims. This greatly simplifies construction, yet maintains complete flexibility. The macro would automatically set EOL to the first character in delimsVar.

So the calling sequence might look like:

Code: Select all

call :defineFor  forMacroName  initialLetter  tokenCount  [delimsVar]

An example of usage for a 150 column CSV with header (no empty values, no commas in values) might look something like:

Code: Select all

set "delims=,"
call :definedFor For$nInA A 150 delims
for "usebackq skip=1 delims=" %%A in ("someFile.csv") do %For$nInA% (
  echo token 1 = %%%$1%
  echo token 2 = %%%$2%
  echo token 3 = %%%$3%
  ...
  echo token 150 = %%%$150%
)


Dave Benham

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Using many "tokens=..." in FOR /F command in a simple way

#42 Post by dbenham » 18 Mar 2017 19:55

And here is the finished product 8)

Documentation is embedded at the top of :defineFor and :defineForChars

I've tested as many as 900 tokens in one line, but this example uses "only" 300.

Code: Select all

@echo off
setlocal enableDelayedExpansion
cls

:: Define a test line with 300 tokens in the format "{1},{2},{3},...{300},"
set "ln="
for /l %%N in (1 1 300) do set "ln=!ln!{%%N},"

:: Define a macro for parsing 300 tokens, capturing the remainder in an extra token
call :defineFor For300InA A 300 ","

:: Parse the line with FOR /F and the macro, and print out all 300 tokens, plus any remainder
for /f "delims=" %%A in ("!ln!These,are,extra,unparsed,tokens") do %For300InA% (
  echo   %%%$1%   %%%$2%   %%%$3%   %%%$4%   %%%$5%   %%%$6%   %%%$7%   %%%$8%   %%%$9%  %%%$10%
  echo  %%%$11%  %%%$12%  %%%$13%  %%%$14%  %%%$15%  %%%$16%  %%%$17%  %%%$18%  %%%$19%  %%%$20%
  echo  %%%$21%  %%%$22%  %%%$23%  %%%$24%  %%%$25%  %%%$26%  %%%$27%  %%%$28%  %%%$29%  %%%$30%
  echo  %%%$31%  %%%$32%  %%%$33%  %%%$34%  %%%$35%  %%%$36%  %%%$37%  %%%$38%  %%%$39%  %%%$40%
  echo  %%%$41%  %%%$42%  %%%$43%  %%%$44%  %%%$45%  %%%$46%  %%%$47%  %%%$48%  %%%$49%  %%%$50%
  echo  %%%$51%  %%%$52%  %%%$53%  %%%$54%  %%%$55%  %%%$56%  %%%$57%  %%%$58%  %%%$59%  %%%$60%
  echo  %%%$61%  %%%$62%  %%%$63%  %%%$64%  %%%$65%  %%%$66%  %%%$67%  %%%$68%  %%%$69%  %%%$70%
  echo  %%%$71%  %%%$72%  %%%$73%  %%%$74%  %%%$75%  %%%$76%  %%%$77%  %%%$78%  %%%$79%  %%%$80%
  echo  %%%$81%  %%%$82%  %%%$83%  %%%$84%  %%%$85%  %%%$86%  %%%$87%  %%%$88%  %%%$89%  %%%$90%
  echo  %%%$91%  %%%$92%  %%%$93%  %%%$94%  %%%$95%  %%%$96%  %%%$97%  %%%$98%  %%%$99% %%%$100%

  echo %%%$101% %%%$102% %%%$103% %%%$104% %%%$105% %%%$106% %%%$107% %%%$108% %%%$109% %%%$120%
  echo %%%$111% %%%$112% %%%$113% %%%$114% %%%$115% %%%$116% %%%$117% %%%$118% %%%$119% %%%$120%
  echo %%%$121% %%%$122% %%%$123% %%%$124% %%%$125% %%%$126% %%%$127% %%%$128% %%%$129% %%%$130%
  echo %%%$131% %%%$132% %%%$133% %%%$134% %%%$135% %%%$136% %%%$137% %%%$138% %%%$139% %%%$140%
  echo %%%$141% %%%$142% %%%$143% %%%$144% %%%$145% %%%$146% %%%$147% %%%$148% %%%$149% %%%$150%
  echo %%%$151% %%%$152% %%%$153% %%%$154% %%%$155% %%%$156% %%%$157% %%%$158% %%%$159% %%%$160%
  echo %%%$161% %%%$162% %%%$163% %%%$164% %%%$165% %%%$166% %%%$167% %%%$168% %%%$169% %%%$170%
  echo %%%$171% %%%$172% %%%$173% %%%$174% %%%$175% %%%$176% %%%$177% %%%$178% %%%$179% %%%$180%
  echo %%%$181% %%%$182% %%%$183% %%%$184% %%%$185% %%%$186% %%%$187% %%%$188% %%%$189% %%%$190%
  echo %%%$191% %%%$192% %%%$193% %%%$194% %%%$195% %%%$196% %%%$197% %%%$198% %%%$199% %%%$200%

  echo %%%$201% %%%$202% %%%$203% %%%$204% %%%$205% %%%$206% %%%$207% %%%$208% %%%$209% %%%$220%
  echo %%%$211% %%%$212% %%%$213% %%%$214% %%%$215% %%%$216% %%%$217% %%%$218% %%%$219% %%%$220%
  echo %%%$221% %%%$222% %%%$223% %%%$224% %%%$225% %%%$226% %%%$227% %%%$228% %%%$229% %%%$230%
  echo %%%$231% %%%$232% %%%$233% %%%$234% %%%$235% %%%$236% %%%$237% %%%$238% %%%$239% %%%$240%
  echo %%%$241% %%%$242% %%%$243% %%%$244% %%%$245% %%%$246% %%%$247% %%%$248% %%%$249% %%%$250%
  echo %%%$251% %%%$252% %%%$253% %%%$254% %%%$255% %%%$256% %%%$257% %%%$258% %%%$259% %%%$260%
  echo %%%$261% %%%$262% %%%$263% %%%$264% %%%$265% %%%$266% %%%$267% %%%$268% %%%$269% %%%$270%
  echo %%%$271% %%%$272% %%%$273% %%%$274% %%%$275% %%%$276% %%%$277% %%%$278% %%%$279% %%%$280%
  echo %%%$281% %%%$282% %%%$283% %%%$284% %%%$285% %%%$286% %%%$287% %%%$288% %%%$289% %%%$290%
  echo %%%$291% %%%$292% %%%$293% %%%$294% %%%$295% %%%$296% %%%$297% %%%$298% %%%$299% %%%$300%

  echo(%%%$301%
)
exit /b


:defineFor  ForMacroName  InputVar  TokenCount  [DelimChars]
::
:: Defines a macro to be used for parsing an arbitrary number of tokens from
:: a FOR variable string. The macro always parses one additional token to hold
:: any remainder of the line that lies beyond the TokenCount tokens.
::
::    ForMacroName = The name of the macro variable to be created.
::
::    InputVar = The name of the FOR variable that contains the string of tokens.
::
::    TokenCount = The number of tokens to parse.
::                 The maximum value is 2304 (256*9)
::
::    DelimChars = An optional string of one or more characters, each of which
::                 is treated as a token delimiter. Default is "<tab><space>".
::                 If <space> is included in the string, then it must be the
::                 last character in the string.
::
:: Tokens are accessed by $n variables.
:: For example, %%%$45% would represent the 45th token.
::
:: FOR /F modifiers may be freely used. For example, %%~nx%$10% would treat the
:: 10th token as a file path, and would expand to the file name and extension.
::
:: Normally, a single FOR /F is limited to 31 tokens, but the macro supports
:: many more, theoretically as many as 2304. However, each line to be parsed
:: must be less than 8191 characters in length.
::
:: This function may be called with delayed expansion enabled or disabled.
:: It is generally recommended that the macro be used with delayed expansion
:: disabled so that tokens containing ! are not corrupted.
::
:: This function automatically calls :defineForChars to define enough $n
:: variables to satisfy the TokenCount+1 tokens.
::
:: Example usage - Suppose you want to parse a well behaved CSV file named
:: test.csv that contains 300 columns. All lines must have the same number of
:: columns, and no column value may contain a comma.
::
:: The following code will correctly parse each data line of test.csv:
::
::    @echo off
::    setlocal disableDelayedExpansion
::    call :defineFor For300InA A 300 ","
::    for /f "skip=1 delims=" %%A in (test.csv) do %For300InA% (
::      echo token   1 = %%%$1%
::      echo token   2 = %%%$2%
::      echo ...
::      echo token 300 = %%%$300%
::    )
::
:: If the first token might begin with any character, including the default
:: EOL character, then the FOR /F line should be changed as follows:
::
::    for /f skip^=1^ delims^=^ eol^= %%A in (test.csv) do %For300InA% (
::   

if %$max%0 gtr %~30 goto :defineForInternal
set /a "$max=(%~3)/256+1"
call :defineForChars %$max%
:defineForInternal
setlocal enableDelayedExpansion
set "delims=%~4"
if not defined delims set "delims= "
set "in=%~2"
set "macro="
set /a max=31, end=0
for /l %%N in (1 31 %~3) do (
  if %%N neq 1 set "in=!$%%N!"
  set /a end+=31
  if !end! gtr %~3 set /a "max=%~3-%%N+1"
  set "macro=!macro! for /f "eol=!delims:~0,1! tokens=1-!max!* delims=!delims!" %%!$%%N! in ("%%!in!") do"
)
for /f "delims=" %%A in ("!macro! ") do endlocal & set "%~1=%%A"
exit /b


:defineForChars  Count
::
:: Defines variables to be used as FOR /F tokens, from $1 to $n, where n = Count*256
:: Also defines $max = Count*256.
:: No other variables are defined or tampered with.
::
:: Once defined, the variables are very useful for parsing lines with many tokens, as
:: the values are guaranteed to be contiguous within the FOR /F mapping scheme.
::
:: For example, you can use $1 as a FOR variable by using %%%$1%.
::
::   FOR /F "TOKENS=1-31" %%%$1% IN (....) DO ...
::
::      %%%$1% = token 1, %%%$2% = token 2, ... %%%$31% = token 31
::
:: This routine never uses SETLOCAL, and works regardless whether delayed expansion
:: is enabled or disabled.
::
:: Three temporary files are created and deleted in the %TEMP% folder, and the active
:: code page is temporarily set to 65001, and then restored to the starting value
:: before returning. Once defined, the $n variables can be used with any code page.
::
for /f "tokens=2 delims=:." %%P in ('chcp') do call :DefineForCharsInternal %1
exit /b
:defineForCharsInternal
set /a $max=%1*256
>"%temp%\forVariables.%~1.hex.txt" (
  echo FF FE
  for %%H in (
    "0 1 2 3 4 5 6 7 8 9 A B C D E F"
  ) do for /l %%N in (1 1 %~1) do for %%A in (%%~H) do for %%B in (%%~H) do (
    echo %%A%%B 0%%N 0D 00 0A 00
  )
)
>nul certutil.exe -decodehex -f "%temp%\forVariables.%~1.hex.txt" "%temp%\forVariables.%~1.utf-16le.bom.txt"
>nul chcp 65001
>"%temp%\forVariables.%~1.utf8.txt" type "%temp%\forVariables.%~1.utf-16le.bom.txt"
<"%temp%\forVariables.%~1.utf8.txt" (for /l %%N in (1 1 %$max%) do set /p "$%%N=")
for %%. in (dummy) do >nul chcp %%P  
del "%temp%\forVariables.%~1.*.txt"
exit /b
--OUTPUT--

Code: Select all

  {1}   {2}   {3}   {4}   {5}   {6}   {7}   {8}   {9}  {10}
 {11}  {12}  {13}  {14}  {15}  {16}  {17}  {18}  {19}  {20}
 {21}  {22}  {23}  {24}  {25}  {26}  {27}  {28}  {29}  {30}
 {31}  {32}  {33}  {34}  {35}  {36}  {37}  {38}  {39}  {40}
 {41}  {42}  {43}  {44}  {45}  {46}  {47}  {48}  {49}  {50}
 {51}  {52}  {53}  {54}  {55}  {56}  {57}  {58}  {59}  {60}
 {61}  {62}  {63}  {64}  {65}  {66}  {67}  {68}  {69}  {70}
 {71}  {72}  {73}  {74}  {75}  {76}  {77}  {78}  {79}  {80}
 {81}  {82}  {83}  {84}  {85}  {86}  {87}  {88}  {89}  {90}
 {91}  {92}  {93}  {94}  {95}  {96}  {97}  {98}  {99} {100}
{101} {102} {103} {104} {105} {106} {107} {108} {109} {120}
{111} {112} {113} {114} {115} {116} {117} {118} {119} {120}
{121} {122} {123} {124} {125} {126} {127} {128} {129} {130}
{131} {132} {133} {134} {135} {136} {137} {138} {139} {140}
{141} {142} {143} {144} {145} {146} {147} {148} {149} {150}
{151} {152} {153} {154} {155} {156} {157} {158} {159} {160}
{161} {162} {163} {164} {165} {166} {167} {168} {169} {170}
{171} {172} {173} {174} {175} {176} {177} {178} {179} {180}
{181} {182} {183} {184} {185} {186} {187} {188} {189} {190}
{191} {192} {193} {194} {195} {196} {197} {198} {199} {200}
{201} {202} {203} {204} {205} {206} {207} {208} {209} {220}
{211} {212} {213} {214} {215} {216} {217} {218} {219} {220}
{221} {222} {223} {224} {225} {226} {227} {228} {229} {230}
{231} {232} {233} {234} {235} {236} {237} {238} {239} {240}
{241} {242} {243} {244} {245} {246} {247} {248} {249} {250}
{251} {252} {253} {254} {255} {256} {257} {258} {259} {260}
{261} {262} {263} {264} {265} {266} {267} {268} {269} {270}
{271} {272} {273} {274} {275} {276} {277} {278} {279} {280}
{281} {282} {283} {284} {285} {286} {287} {288} {289} {290}
{291} {292} {293} {294} {295} {296} {297} {298} {299} {300}
These,are,extra,unparsed,tokens
Dave Benham

penpen
Expert
Posts: 2009
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Using many "tokens=..." in FOR /F command in a simple way

#43 Post by penpen » 18 Mar 2017 20:35

I've tested all valid unicode characters within the BMP above U+003F (under Windows 10), if they can be used as a "for/f" variable, and if they are directly sequential.

There are two sequences as expected:
1) U+003F, ..., U+D7FF
2) U+E000, ..., U+FFFF

I'm using some (lazy) Definitions: ("declaration-variable", "auto-generated")
In the following example "%%a" is a "declaration-variable" "%%b" is "auto-generated":

Code: Select all

for /f "tokens=1*" %%a in ("@1 @2") do echo(%%~a %%~b
You also never need to escape any of them for this usage.

If the variable is auto-generated, then yes this variable can be used as a for/f variable.

But i (accidently) found out, that you cannot use all characters for the "declaration-variable":
The unicode characters U+202F (' '), and U+205F (' ') are two examples for that.


My test code (on my pc it takes ~10 seconds to create the files, and ~12 seconds to execute the test):

Code: Select all

@echo off
setlocal enableExtensions enableDelayedExpansion
cls

for /f "tokens=2 delims=:." %%a in ('chcp') do set "cp=%%~a"
set "UTF-16LE=1200"

>nul chcp 850
>"debug.txt" <nul set /p "="

for %%a in ("debug.txt") do (
   if not "%%~za" == "3" (
      echo(This program needs ANSI output; use: "cmd /A".
      chcp %cp%
      goto :eof
   )
)

:: "variables.hex"
if not exist "variables.hex.txt" (
   setlocal
   set "hex= 0 1 2 3 4 5 6 7 8 9 A B C D E F"
   set "line="
   set "last="
   echo(FF FE

   rem: Basic Multilingual Plane {65 chars each row encapsulated in #'s; dos newlines; 1 overlap}.
   for %%d in (!hex!) do for %%c in (!hex!) do for %%B in ("0 1 2 3" "4 5 6 7" "8 9 A B" "C D E F") do (
      if "00,0 1 2 3" == "%%~d%%~c,%%~B" (
         rem: Ignored: contains control characters.
         set "last=3F 00"
      ) else if "100,4 5 6 7" == "%%~d%%~c,%%~B" (
         rem: Ignored: contains pipe character.
         set "last=7F 00"
      ) else if "D8" == "%%~d%%~c" ( rem: Ignored: High Surrogate
      ) else if "D9" == "%%~d%%~c" ( rem: Ignored: High Surrogate
      ) else if "DA" == "%%~d%%~c" ( rem: Ignored: High Surrogate
      ) else if "DB" == "%%~d%%~c" ( rem: Ignored: High Surrogate
      ) else if "DC" == "%%~d%%~c" ( rem: Ignored: Low Surrogate
      ) else if "DD" == "%%~d%%~c" ( rem: Ignored: Low Surrogate
      ) else if "DE" == "%%~d%%~c" ( rem: Ignored: Low Surrogate
      ) else if "DF" == "%%~d%%~c" ( rem: Ignored: Low Surrogate
      ) else (
         echo(23 00 !last!
         for %%b in (%%~B) do (
            for %%a in (!hex!) do (
               echo(%%~b%%~a %%~d%%~c
            )
            set "last=%%~bF %%~d%%~c"
         )
         echo(23 00 0D 00 0A 00
      )
   )
   endlocal
) >"variables.hex.txt"

:: "variables.txt"
if not exist "variables.utf-16le.bom.txt" if exist "variables.hex.txt" (
   >nul certutil.exe -decodehex -f "variables.hex.txt" "variables.utf-16le.bom.txt"
)

>nul chcp 65001
if not exist "variables.utf8.txt" (
   >"variables.utf8.txt" type "variables.utf-16le.bom.txt"
)

echo(!time!
<"variables.utf8.txt" (
::                                 991 == 1024 - 1 (control characters) - 16 (high surrogate) - 16 (low surrogate)
   for /l %%b in (10001, 1, 10991) do (
      set "index=%%~b"
      set "index=!index:~1!"

      set "variables="
      set /p "variables="
      set "variables=!variables:~1,-1!"

      for /l %%a in (0, 16, 63) do (
         set "v=!variables:~%%~a,17!                 "
         set "v=!v:~0,17!"
rem         <nul set /p "=Testing "!v!" [!index!]:"
         if "t" == "f" ( rem: dummy
         ) else if "!index! %%~a" == "0001 16" (
            <nul set /p "=Testing "!v!" [!index!]:"
            call :test_CIRCUMFLEX_ACCENT

         ) else if "!index! %%~a" == "0001 48" (
            <nul set /p "=Testing "!v!" [!index!]:"
            call :test_PIPE

         ) else if "!index! %%~a" == "0128 32" (
            <nul set /p "=Testing "!v!" [!index!]:"
            set "test=!v!"
            call :test
         ) else if "!index! %%~a" == "0128 48" (
            set "test=!test!!v:~1!"
            set "v=!test!"
            call :test_Uplus202F_Uplus205F

         ) else if "!index! %%~a" == "0129 16" (
            <nul set /p "=Testing "!v!" [!index!]:"
            set "test=!v!"
            call :test
         ) else if "!index! %%~a" == "0129 32" (
            set "test=!test!!v:~1!"
            set "v=!test!"
            call :test_Uplus202F_Uplus205F

         ) else if "!index! %%~a" == "0864 0"  (
            <nul set /p "=Testing "!v!" [!index!]:"
            call :test_UplusD7FF_UplusE000

         ) else (
            <nul set /p "=Testing "!v!" [!index!]:"
            call :test
         )
      )
   )
) >>"debug.txt"
echo(!time!


findstr /n /v /l "@01 @02 @03 @04 @05 @06 @07 @08 @09 @10 @11 @12 @13 @14 @15 @16 @17" "debug.txt" && echo(Usability test 1: Failed. || echo(Usability test 1: Succeeded.

>nul chcp %cp%

endlocal
goto :eof


:test_CIRCUMFLEX_ACCENT
for /f "tokens=1-17" %%%v:~0,1% in ("@01 @02 @03 @04 @05 @06 @07 @08 @09 @10 @11 @12 @13 @14 @15 @16 @17") do (
   echo( %%~%v:~0,1% %%~%v:~1,1% %%~%v:~2,1% %%~%v:~3,1% %%~%v:~4,1% %%~%v:~5,1% %%~%v:~6,1% %%~%v:~7,1% %%~%v:~8,1% %%~%v:~9,1% %%~%v:~10,1% %%~%v:~11,1% %%~%v:~12,1% %%~%v:~13,1% %%~%v:~14,1% %%~^%v:~15,1% %%~%v:~16,1%
)
goto :eof

:test_PIPE
for /f "tokens=1-17" %%%v:~0,1% in ("@01 @02 @03 @04 @05 @06 @07 @08 @09 @10 @11 @12 @13 @14 @15 @16 @17") do (
   echo( %%~%v:~0,1% %%~%v:~1,1% %%~%v:~2,1% %%~%v:~3,1% %%~%v:~4,1% %%~%v:~5,1% %%~%v:~6,1% %%~%v:~7,1% %%~%v:~8,1% %%~%v:~9,1% %%~%v:~10,1% %%~%v:~11,1% %%~%v:~12,1% %%~^%v:~13,1% %%~%v:~14,1% %%~%v:~15,1% %%~%v:~16,1%
)
goto :eof

:test_Uplus202F_Uplus205F
:: U+202F cannot be used in variable declaration part of for !
:: U+205F the same
<nul set /p "=Testing "!v:~4,17!" [!index!]:"
for /f "tokens=1-17" %%%v:~4,1% in ("@01 @02 @03 @04 @05 @06 @07 @08 @09 @10 @11 @12 @13 @14 @15 @16 @17") do (
   echo( %%~%v:~4,1% %%~%v:~5,1% %%~%v:~6,1% %%~%v:~7,1% %%~%v:~8,1% %%~%v:~9,1% %%~%v:~10,1% %%~%v:~11,1% %%~%v:~12,1% %%~%v:~13,1% %%~%v:~14,1% %%~%v:~15,1% %%~%v:~16,1% %%~%v:~17,1% %%~%v:~18,1% %%~%v:~19,1% %%~%v:~20,1%
)
<nul set /p "=Testing "!v:~18!  " [!index!]:"
for /f "tokens=1-15" %%%v:~18,1% in ("@01 @02 @03 @04 @05 @06 @07 @08 @09 @10 @11 @12 @13 @14 @15") do (
   echo( %%~%v:~18,1% %%~%v:~19,1% %%~%v:~20,1% %%~%v:~21,1% %%~%v:~22,1% %%~%v:~23,1% %%~%v:~24,1% %%~%v:~25,1% %%~%v:~26,1% %%~%v:~27,1% %%~%v:~28,1% %%~%v:~29,1% %%~%v:~30,1% %%~%v:~31,1% %%~%v:~32,1% @16 @17
)
goto :eof

:test_UplusD7FF_UplusE000
for /f "tokens=1-16" %%%v:~1,1% in ("@02 @03 @04 @05 @06 @07 @08 @09 @10 @11 @12 @13 @14 @15 @16 @17") do (
   echo( @01 %%~%v:~1,1% %%~%v:~2,1% %%~%v:~3,1% %%~%v:~4,1% %%~%v:~5,1% %%~%v:~6,1% %%~%v:~7,1% %%~%v:~8,1% %%~%v:~9,1% %%~%v:~10,1% %%~%v:~11,1% %%~%v:~12,1% %%~%v:~13,1% %%~%v:~14,1% %%~%v:~15,1% %%~%v:~16,1%
)
goto :eof

:test
for /f "tokens=1-17" %%%v:~0,1% in ("@01 @02 @03 @04 @05 @06 @07 @08 @09 @10 @11 @12 @13 @14 @15 @16 @17") do (
   echo( %%~%v:~0,1% %%~%v:~1,1% %%~%v:~2,1% %%~%v:~3,1% %%~%v:~4,1% %%~%v:~5,1% %%~%v:~6,1% %%~%v:~7,1% %%~%v:~8,1% %%~%v:~9,1% %%~%v:~10,1% %%~%v:~11,1% %%~%v:~12,1% %%~%v:~13,1% %%~%v:~14,1% %%~%v:~15,1% %%~%v:~16,1%
)
goto :eof


penpen

Edit: Corrected utf-8 bom to "".

Thor
Posts: 43
Joined: 31 Mar 2016 15:02

Re: Using many "tokens=..." in FOR /F command in a simple way

#44 Post by Thor » 18 Mar 2017 23:14

dbenham wrote:And here is the finished product 8)

Documentation is embedded at the top of :defineFor and :defineForChars

I've tested as many as 900 tokens in one line, but this example uses "only" 300.

Dave Benham


Very nice batch file, I could test up to 1323 tokens correctly. :D
But more than that I could not see anything at the command prompt.
My originally testing is to test up to 2100 tokens, but I could not see anything at the command prompt, so I reduce up to a point of "1323" tokens then I could see it display correctly. But from 1324th token on I could not see anything from the command prompt. I don't know what went wrong or is it the limit?

penpen
Expert
Posts: 2009
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Using many "tokens=..." in FOR /F command in a simple way

#45 Post by penpen » 19 Mar 2017 19:12

So i've found four (or five) types of characters (tested on Windows 10, 32 bit, german, patches up to date only):
1) I couldn't use these ones for any "for/f"-variable:
- U+0000 "NULL",
- U+000D "CARRIAGE RETURN (CR)"

2) These characters are only usable in automatically created "for/f"-variables, but you cannot use them to declare/define such a variable:
- U+0009 "CHARACTER TABULATION",
- U+000B "LINE TABULATION",
- U+000C "FORM FEED (FF)",
- U+0020 "SPACE",
- U+002C "COMMA",
- U+003B "SEMICOLON",
- U+003D "EQUALS SIGN",
- U+00A0 "NO-BREAK SPACE",
- U+1680 "OGHAM SPACE MARK",
- U+180E "MONGOLIAN VOWEL SEPARATOR",
- U+2000 "EN QUAD",
- U+2001 "EM QUAD",
- U+2002 "EN SPACE",
- U+2003 "EM SPACE",
- U+2004 "THREE-PER-EM SPACE",
- U+2005 "FOUR-PER-EM SPACE",
- U+2006 "SIX-PER-EM SPACE",
- U+2007 "FIGURE SPACE",
- U+2008 "PUNCTUATION SPACE",
- U+2009 "THIN SPACE",
- U+200A "HAIR SPACE",
- U+2028 "LINE SEPARATOR",
- U+2029 "PARAGRAPH SEPARATOR",
- U+202F "NARROW NO-BREAK SPACE",
- U+205F "MEDIUM MATHEMATICAL SPACE",
- U+3000 "IDEOGRAPHIC SPACE"

3) You have to escape these (most with a caret, but line feed with itself) in order to use them as "for/f"-variables:
- U+000A "LINE FEED (LF)",
- U+0022 "QUOTATION MARK",
- U+0026 "AMPERSAND",
- U+003C "LESS-THAN SIGN",
- U+003E "GREATER-THAN SIGN",
- U+005E "CIRCUMFLEX ACCENT",
- U+007C "VERTICAL LINE"

4) Any other character within U+0001 - U+D7FF, and U+E000 - 0xFFFF can be used without escaping for any "for/f"-variable.

5) I haven't tested any character built out of the surrogates, because:
- if UTF-16 is used, then they all consists out of 2 surrogates, or
- if UCS-2 is used, then a single surrogate is not build easily (undefined, and therefore invalid in UTF-8, so the utf_8 byte sequence matching these values are replaced by U+FFFD "REPLACEMENT CHARACTER").

The longest possible sequence of characters without any limitation to be used as "for/f"-variables is U+3001 - U+D7FF (43007 characters):
That should be sufficient :D .


Thor wrote:I don't know what went wrong or is it the limit?
Without seeing your code one only could guess:
Maybe you (most probably) store the search string within a variable and you reached its maximum capacity of 8191 bytes.
But it might also be, that one of the variables you want to declaring is one of the above (Type 2)) although this should provoke an error message on stderr.

(Commandline-)Example of Type 2 ";":

Code: Select all

Z:\>for /f "tokens=1-2" %; in ("3 4") do @echo %;
"%" kann syntaktisch an dieser Stelle nicht verarbeitet werden.

Z:\>for /f "tokens=1-2" %: in ("3 4") do @echo %: %;
3 4


penpen

Edit: Thanks to Dave for finding the wrong classification of U+000B "LINE TABULATION", and U+000C "FORM FEED (FF)"; i've corrected the above list.

Post Reply