Discussion about jeb's batch parsing rules on StackOverflow

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
penpen
Expert
Posts: 2009
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Discussion about jeb's batch parsing rules on StackOverflow

#16 Post by penpen » 30 Jan 2018 18:22

dbenham wrote:
30 Jan 2018 07:51
Else break the command token before the first occurrence of + / [ ] or standard token delimiter
I'm unsure if this is right.
Under Windows 10 enable debugging like npocmaka_ described (viewtopic.php?t=6438&p=54459#p54448; i've built it into "enableDebug.bat" in the following post).

Create "test.bat":

Code: Select all

@echo off
if a==a echo[ test.bat
echo[ test.bat
goto :eof
Create "echo[.bat":

Code: Select all

@echo( echo[.bat
Then this is the result (at least under my win 10, home, x64):

Code: Select all

Z:\>enableDebug.bat
Z:\>test.bat
Cmd: test.bat  Type: 0
@
  Cmd: echo  Type: 0 Args: ` off'
if
  Cmd: a  Type: 39 Args: `a'
  Cmd: echo[  Type: 0 Args: ` test.bat'
 test.bat
Cmd: echo[  Type: 0 Args: ` test.bat'
@
  Cmd: echo  Type: 0 Args: `(echo[.bat'
echo[.bat

Z:\>
There you can seee, that theparsing splits the command after the '[' char (but before the '(' char).

penpen

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Discussion about jeb's batch parsing rules on StackOverflow

#17 Post by dbenham » 30 Jan 2018 23:23

Your code gives the exact output I expect based on my proposed rules.
penpen wrote:
30 Jan 2018 18:22
There you can seee, that theparsing splits the command after the '[' char (but before the '(' char).
You lost me. I don't see a single line in any of your code where you have a line with '[' before a '('.

The only line that has both characters is the line within ECHO[.BAT, and that has '(' before '['. And phase 2 breaks the command token before '(', as expected. So the command token is 'ECHO' when it reaches phase 7, which is simply executed as an internal command.


Dave Benham

penpen
Expert
Posts: 2009
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Discussion about jeb's batch parsing rules on StackOverflow

#18 Post by penpen » 31 Jan 2018 06:59

I meant under my Windows 10 it seems the following happens:
The command "echo[ test.bat" is split not before but after the '[' character into "echo[" and " test.bat".
But i wanted to confirm that the command "echo(test.bat" is split before the '(' character into "echo" and "(test.bat".

penpen

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Discussion about jeb's batch parsing rules on StackOverflow

#19 Post by dbenham » 31 Jan 2018 08:42

Yes.

ECHO(TEST.BAT

In phase 2, ( is a token delimiter, so ECHO(TEST.BAT is split into a command token of ECHO, and arguments token of (TEST.BAT.

Phase 7 simply gets the command token of ECHO, and immediately recognizes it as an internal command.

ECHO[ TEST.BAT

In phase 2, [ is not a token delimiter, but <space> is. The command is split into a command token of ECHO[ and arguments token of <space>TEST.BAT.

Phase 7 does not recognize the ECHO[ command token as an internal command. So it moves on to the next test.
The command token is split before [, and it recognizes ECHO as a potential internal command. How it is handled depends on whether ECHO[.BAT exists:
- If the batch script exists, then it is executed
- If not, then the internal command is executed, with [<space>TEST.BAT as the arguments.


Dave Benham

jeb
Expert
Posts: 1055
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

Re: Discussion about jeb's batch parsing rules on StackOverflow

#20 Post by jeb » 31 Jan 2018 10:45

Thanks penpen for the idea to test with the debug mode.

I tested it with

Code: Select all

echo ###1
echo(###2
echo (###3
echo[###4
echo=###5
I only show the relevant parts
output wrote: Cmd: echo Type: 0 Args: ` ###1'
Cmd: echo Type: 0 Args: `(###2'
Cmd: echo Type: 0 Args: ` (###3'
Cmd: echo[###4 Type: 0
Cmd: echo Type: 0 Args: `=###5'
My conclusion is:
(like Dave) In phase2 the ( is a token delimiter lie ;,= and it's part of the argument.

But as even the plain echo ###1 the first character of the argument is a space, I suppose that the first character will be removed, always.

The only question left, is why echo(/? works but echo=/? fails, even the arguments looks quite the same. The explanation is in PPS

jeb
PS: It seems that the most other command, can't work properly with any other delimiter than a space
IF works with the standard delimiter ;,= and space, but not with (
SET only removes trailing space, but SET,a=b sets the variable ",a" to "b"

PPS:
I suppose it's the ECHO parsing for /? that uses a quite different way than other commands.
It tests if the first "token" starts with "/?", then it shows the help.
But it strips all (standard) delimiters while it searches.
These varaiants all show the help

Code: Select all

echo /?
echo=/?
echo=/?
echo   =;, /=,; ?
echo ;,/;=?TEXT
But it only strips the standard delimiters (,;= and space) therefore the "echo(" works so nicely 8)

jeb
Expert
Posts: 1055
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

Re: Discussion about jeb's batch parsing rules on StackOverflow

#21 Post by jeb » 31 Jan 2018 12:12

Now when it's clear, I found also some examples to demonstrate the token splitting even without debug mode.

Code: Select all

@echo off
setlocal EnableDelayedExpansion
set var=#####
echo]!var! ^^^^  #1
echo(!var! ^^^^  #2
output wrote:##### ^^ #1
##### ^ #2
In the first example the carets are only reduced by phase 2, but not by phase 5, as the exclamation marks are part of the command token
cmdToken="echo]!var"
args = "^^ #1"

But the second sample shows that the "(" is a cmdToken delimiter and therefore the args activate phase5 and reduce the carets a second time
cmdToken="echo"
args = "(!var! ^^ #2"

The concusion is that there exists different sets of delimiters for the different phases.
Phase2 ,;= space and (
Phase 7 :][/\;,=+ space ,but not (
And the individual commands can have another, different set

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Discussion about jeb's batch parsing rules on StackOverflow

#22 Post by dbenham » 31 Jan 2018 12:49

jeb wrote:
31 Jan 2018 10:45
I suppose it's the ECHO parsing for /? that uses a quite different way than other commands.
It tests if the first "token" starts with "/?", then it shows the help.
But it strips all (standard) delimiters while it searches.
These varaiants all show the help

Code: Select all

echo /?
echo=/?
echo=/?
echo   =;, /=,; ?
echo ;,/;=?TEXT
But it only strips the standard delimiters (,;= and space) therefore the "echo(" works so nicely 8)
I pretty much concluded the same thing at Re: ECHO. FAILS to give text or blank line - Instead use ECHO/
dbenham wrote: When ECHO sees ;/? or ,/? or =/? it sees a token delimiter and then a string beginning with help option. So it prints help.

But (/? does not start with a token delimiter, so the /? is masked. Then the leading character is stripped and the remainder is printed.
Dave Benham

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Discussion about jeb's batch parsing rules on StackOverflow

#23 Post by dbenham » 31 Jan 2018 12:55

jeb wrote:
31 Jan 2018 12:12
The concusion is that there exists different sets of delimiters for the different phases.
Phase2 ,;= space and (
Phase 7 :][/\;,=+ space ,but not (
And the individual commands can have another, different set
Phase 2 - you forgot <tab> and <0xFF>

Phase 7 - Not all those delimiters are treated equally, as outlined by my 7.1 rules.
Also, I'm pretty sure the exact same rules are used to determine what internal command to execute, regardless of the command.
Things don't vary by command until the individual command parses the command arguments.


Dave Benham

jeb
Expert
Posts: 1055
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

Re: Discussion about jeb's batch parsing rules on StackOverflow

#24 Post by jeb » 31 Jan 2018 13:12

Good point
dbenham wrote:
31 Jan 2018 12:49
I pretty much concluded the same thing at Re: ECHO. FAILS to give text or blank line - Instead use ECHO/
dbenham wrote:
When ECHO sees ;/? or ,/? or =/? it sees a token delimiter and then a string beginning with help option. So it prints help.

But (/? does not start with a token delimiter, so the /? is masked. Then the leading character is stripped and the remainder is printed.
But when I read it, I didn't understand why echo should see the "(" at all and why it is removed later.
dbenham wrote:
31 Jan 2018 12:55
Phase 2 - you forgot <tab> and <0xFF>
You are right, I was sure that I forgot something :D
dbenham wrote:
31 Jan 2018 12:55
Phase 7 - Not all those delimiters are treated equally, as outlined by my 7.1 rules.
Also, I'm pretty sure the exact same rules are used to determine what internal command to execute, regardless of the command.
Things don't vary by command until the individual command parses the command arguments.
I think the same, my text was a bit misleading.
In Phase 7 also the TAB is a delimiter. But 0xFF is special here

Code: Select all

@echo off
setlocal EnableDelayedExpansion

for /f "delims=" %%A in (
    'forfiles /p "%~dp0." /m "%~nx0" /c "cmd /c echo(0xFF"'
) do set "FF=%%~A"

set "var=!FF!#####"

echo!var! ^^^^  #1
echo%var% ^^^^  #2
Output wrote: ##### ^^ #1
##### ^^ #2
As you can see in sample #1, that it works as a delimiter in phase7, but it's also shown by the echo

penpen
Expert
Posts: 2009
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Discussion about jeb's batch parsing rules on StackOverflow

#25 Post by penpen » 31 Jan 2018 14:01

I still don't understand why there are different results (see my above example) between:
- "if a==a echo[ test.bat" which executes the internal command "echo", and
- "echo[ test.bat" which executes the batch file "echo[.bat".


penpen

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Discussion about jeb's batch parsing rules on StackOverflow

#26 Post by dbenham » 31 Jan 2018 15:15

jeb wrote:
31 Jan 2018 13:12
In Phase 7 also the TAB is a delimiter. But 0xFF is special here

Code: Select all

@echo off
setlocal EnableDelayedExpansion

for /f "delims=" %%A in (
    'forfiles /p "%~dp0." /m "%~nx0" /c "cmd /c echo(0xFF"'
) do set "FF=%%~A"

set "var=!FF!#####"

echo!var! ^^^^  #1
echo%var% ^^^^  #2
Output wrote: ##### ^^ #1
##### ^^ #2
As you can see in sample #1, that it works as a delimiter in phase7, but it's also shown by the echo
The #1 behavior kind of makes sense if 0xFF is considered to be a non-breaking space.
But the difference between #1 and #2 is odd. That means the ECHO command must be able to tell which characters are left over from the command token, and which are from the original arguments token.
That means my following rule doesn't quite tell the whole story:
dbenham wrote: If an internal command is parsed from a larger command token, then the unused portion of the command token is included in the argument list
I had assumed that the left over text is simply prepended to the arguments token before the internal command parses the arguments.
But instead, the internal command must parse the left over text as a separate token before it parses the original arguments token.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Discussion about jeb's batch parsing rules on StackOverflow

#27 Post by dbenham » 31 Jan 2018 15:27

penpen wrote:
31 Jan 2018 14:01
I still don't understand why there are different results (see my above example) between:
- "if a==a echo[ test.bat" which executes the internal command "echo", and
- "echo[ test.bat" which executes the batch file "echo[.bat".
I don't understand why MS wrote cmd.exe that way.
But my rules do predict the behavior.
dbenham wrote:
30 Jan 2018 07:51
  • Else break the command token before the first occurrence of + / [ ] or standard token delimiter
    If the preceding text is an internal command, then remember that command
    • If in command line mode, or if the command is from a parenthesized block, IF command block, FOR command block, or involved with command concatenation, then execute the internal command
      .
    • Else (must be a stand-alone command in batch mode) scan the current folder and the PATH for a .COM, .EXE, .BAT, or .CMD file whose base name matches the original command token
      • If the first matching file is a .BAT or .CMD, then goto 7.3.exec and execute that script
      • Else (match not found or first match is .EXE or .COM) execute the remembered internal command
Both commands match the green rule.
The first command matches the blue rule.
The second command matches the brown rules.


Dave Benham

jeb
Expert
Posts: 1055
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

Re: Discussion about jeb's batch parsing rules on StackOverflow

#28 Post by jeb » 31 Jan 2018 16:19

dbenham wrote:
31 Jan 2018 15:27
I don't understand why MS wrote cmd.exe that way.
But my rules do predict the behavior.
Yes, the rules seem to be accurate to explain how it works.
But I can't believe that a sane human would code it that way.
I suppose, that the behaviour is only a side effect of some parts we currently don't know or don't understand quite right.

But now back to delimiters ... :D
I forgot to test line feed and carriage return.

Both characters seem to work as delimiters only in phase7.
Carriage return can't be tested with phase 2, as I'm known no way to inject a carriage return which survives up to phase2

But for the ECHO command they are not part of the delimiters, therefore they both work for /?

Code: Select all

@echo off
setlocal EnableDelayedExpansion

for /f "delims=" %%A in (
    'forfiles /p "%~dp0." /m "%~nx0" /c "cmd /c echo(0xFF"'
) do set "FF=%%~A"
for /F %%# in ('copy /Z "%~dpf0" NUL') do set "CR=%%#"                         & rem capture carriage return char
(set LF=^
%=empty line=%
)
set "var=#########"

REM *** References
echo !var:~,7! ^^^^ #1
echo(!var:~,7! ^^^^ #2
echo(/?        #3 ( echo delimiter test

echo^%LF%%LF%!var:~,7! ^^^^ #4  LF Phase2 test- Fail, it is handled in phase7
echo!LF!!var:~,7! ^^^^ #5	LF Phase7 test
echo!CR!!var:~,7! ^^^^ #6	CR Phase7 test

echo!LF!/?        #7   LF   echo!LF!delimiter!CR!test
echo!CR!/?        #8 CR echo delimiter!CR!test



Regarding the output of test #7 and #8 :?: :!:
Output wrote:####### ^ #1
####### ^ #2
/? #3 ( echo delimiter test
var:~,7 ^ #4 LF Phase2 test - Fail, it is handled in phase7
var:~,7 ^ #5 LF Phase7 test
var:~,7 ^ #6 CR Phase7 test
/? #7 LF echo delimiter test
/? #8 CR echo delimiter test
In #7 and #8 all characters of the form <space><tab><CR><LF> are replaced by spaces and reduced to only one space consecutive :!:
I've never saw this behaviour in batch before :shock:

jeb

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Discussion about jeb's batch parsing rules on StackOverflow

#29 Post by dbenham » 01 Feb 2018 07:28

Freaky :shock:

Your !var:~,7! expansion is broken because you forgot to escape the comma, so the expression is split between the command token and the arguments token in phase 2.
jeb wrote:
31 Jan 2018 16:19
In #7 and #8 all characters of the form <space><tab><CR><LF> are replaced by spaces and reduced to only one space consecutive :!:
I've never saw this behaviour in batch before :shock:
Your description isn't quite correct. It is true for those characters within the command token. But in the arguments token the behavior changes a bit. The <space> and <tab> within the arguments token are still collapsed into one space if <LF> or <CR> appears in command token, but <CR> and <LF> are printed normally in the arguments token.

And we have seen very similar behavior before, but we never realized it.

Code: Select all

@echo off
setlocal EnableDelayedExpansion
(set LF=^
%=empty line=%
)
prompt prompt$g
echo on

for %%A in (1   ====,,,,,;;;;; 2


3) do @echo %%A

(echo 1
echo 2)
--OUTPUT--

Code: Select all

prompt>for %A in (1 2 3) do @echo %A
1
2
3

prompt>(
echo 1
 echo 2
)
1
2
But in this case all token delimiters are collapsed into a single space, not just white space characters.
And within a parenthesized block of commands, the consecutive <LF> is collapsed into <LF><space>.

This led me to discover a totally brand new effect :!:

Code: Select all

@echo off
setlocal
(set LF=^
%= Empty line results in <LF> =%
)
prompt prompt$g
echo on
for %%A in (1 =,;%LF%%LF%;,= 2) do @echo %%A
for %%A in ("1 =,;%LF%%LF;,= %2^") do @echo %%A
(echo 1 =,;%LF%%LF%;,= echo 2)
(echo "1 =,;%LF%%LF%;,= echo 2^")
--OUTPUT-- (broken up into multiple code blocks to prevent scrolling)

Code: Select all

prompt>for %A in (1 2) do @echo %A
1
2

prompt>for %A in ("1 =,; 2") do @echo %A
"1 =,; 2"

Code: Select all

prompt>(
echo 1 =,;
 echo 2
)
1 =,;
2

prompt>(
echo "1 =,;
 echo 2"
)
"1 =,;
2"
%LF% is fully functional within phase 2 as long as it is within a parenthesized block. It does not instantly terminate the line :shock: :!:

So the rules for <LF> in phase 2 need serious updating:

The behavior of <LF> varies depending on context. But quotes never alter the behavior.

Escaped <LF>
  • <LF> is stripped
  • Next character is escaped, even if it is <LF>
Unescaped <LF> not within parentheses
  • <LF> is stripped
  • Remainder of line is not parsed (totally ignored)
Unescaped <LF> within FOR IN parenthesized block
  • <LF> turns off the quote flag
  • <LF> functions as any other token delimiter
  • Consecutive token delimiters are collapsed into a single space
Unescaped <LF> within parenthesized command block
  • <LF> turns off the quote flag
  • <LF> converted into <LF><space>
  • Any immediately trailing string of token delimiters and <LF> is stripped

Dave Benham

jeb
Expert
Posts: 1055
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

Re: Discussion about jeb's batch parsing rules on StackOverflow

#30 Post by jeb » 02 Feb 2018 01:55

dbenham wrote:
01 Feb 2018 07:28
Your !var:~,7! expansion is broken because you forgot to escape the comma, so the expression is split between the command token and the arguments token in phase 2.
Exactly that was my intention, to simply detect, if the character between ECHO and !var is a phase 2 delimiter.


dbenham wrote:
01 Feb 2018 07:28
This led me to discover a totally brand new effect
...
%LF% is fully functional within phase 2 as long as it is within a parenthesized block. It does not instantly terminate the line :shock: :!:

So the rules for <LF> in phase 2 need serious updating:
:D I suppose we are too old to remember all the things we already know.
Do you can remember the batch macros? They are using exactly this effect, when the macro is defined, the LF is injected in an escaped form, but when executing a macro it uses an unescaped LF.

dbenham wrote:
01 Feb 2018 07:28
So the rules for <LF> in phase 2 need serious updating:
The behavior of <LF> varies depending on context. But quotes never alter the behavior.
You are right, I never thought of add this to the rules, as I'm discovered the LF parenthesis trick much later.

jeb

Post Reply