Discussion about jeb's batch parsing rules on StackOverflow
Moderator: DosItHelp
Discussion about jeb's batch parsing rules on StackOverflow
The purpose of this thread is to have a central place to discuss the batch parsing rules on StackOverflow that jeb initiated.
Of particular interest are discussions about shortcomings or inaccuracies of the current model, along with suggestions for improvements.
I've already made a great many changes to the original posted rules. But there is still room for improvement.
There are already a number of DosTips threads that investigate various aspects of this topic. At some point I may add links to those topics. But I hope future discussion always takes place here.
Currently there are two issues that I am thinking about:
1) Should phases 3 and 4 be reversed?
The echoing of parsed commands (phase 3) occurs at two points: after the initial round of phase 2 (main parser), and then again after each round of phase 4 (FOR variable expansion for each DO iteration).
I think the logic would be much simpler to describe if the order of phases 3 and 4 were reversed. But I am reluctant to renumber the phases for fear of breaking phase references in historical posts.
What do you think jeb
2) I think phase 7 (command execution) needs some refinement
I greatly expanded phase 7. But I see a potential problem, and I'm not sure how to correct it.
Sometimes a command can be both an internal command and an external command. For example, creation of an ECHO.BAT file.
Clearly the parser generally selects the internal command over the external command in phase 7.
Assuming ECHO.BAT exists in the current folder, then ECHO OK will print OK (execute the internal command) instead of executing the ECHO.BAT.
The CALL rules in phase 6 already account for the fact that CALL ECHO will call the batch script instead, because phase 6 identifies the batch script before phase 7 has a chance to execute the internal command.
Also supporting the existing rules, if I have TEST.BAT in the current folder, then when I execute ECHO\..\TEST, it simply prints out ..\TEST
But I am disturbed by ECHO\..\TEST.BAT - it executes the batch script instead
Also, ECHO.BAT will execute the batch script instead of the internal command.
I'm struggling to find a set of simple rules that can account for the differences.
Dave Benham
Of particular interest are discussions about shortcomings or inaccuracies of the current model, along with suggestions for improvements.
I've already made a great many changes to the original posted rules. But there is still room for improvement.
There are already a number of DosTips threads that investigate various aspects of this topic. At some point I may add links to those topics. But I hope future discussion always takes place here.
Currently there are two issues that I am thinking about:
1) Should phases 3 and 4 be reversed?
The echoing of parsed commands (phase 3) occurs at two points: after the initial round of phase 2 (main parser), and then again after each round of phase 4 (FOR variable expansion for each DO iteration).
I think the logic would be much simpler to describe if the order of phases 3 and 4 were reversed. But I am reluctant to renumber the phases for fear of breaking phase references in historical posts.
What do you think jeb
2) I think phase 7 (command execution) needs some refinement
I greatly expanded phase 7. But I see a potential problem, and I'm not sure how to correct it.
Sometimes a command can be both an internal command and an external command. For example, creation of an ECHO.BAT file.
Clearly the parser generally selects the internal command over the external command in phase 7.
Assuming ECHO.BAT exists in the current folder, then ECHO OK will print OK (execute the internal command) instead of executing the ECHO.BAT.
The CALL rules in phase 6 already account for the fact that CALL ECHO will call the batch script instead, because phase 6 identifies the batch script before phase 7 has a chance to execute the internal command.
Also supporting the existing rules, if I have TEST.BAT in the current folder, then when I execute ECHO\..\TEST, it simply prints out ..\TEST
But I am disturbed by ECHO\..\TEST.BAT - it executes the batch script instead
Also, ECHO.BAT will execute the batch script instead of the internal command.
I'm struggling to find a set of simple rules that can account for the differences.
Dave Benham
Re: Discussion about jeb's batch parsing rules on StackOverflow
I just realized - I think issue 2 can be resolved by a small change to 7.1:
Can anyone find any exceptions to the above rule?
Not explicitly stated, but the path need not match an executable file. If the command token matches any existing file, then 7.1 is skipped. Later on 7.3 execution will fail with an error if it is unable to match the command token with a valid executable file.proposed change wrote:
- 7.1 - Execute internal command - If the command token is quoted or the command token is a path to an existing file (any extension must be included), then 7.1 is skipped. Otherwise, if an internal command can be parsed from the command token, then execute the internal command.
- Normally the command token exactly matches the name of an internal command. But it is possible for options and or arguments to be included in the command token. For example `echo(Hello world` is parsed as an ECHO command with arguments `Hello world`. The exact internal command parsing rules vary from command to command.
Can anyone find any exceptions to the above rule?
Re: Discussion about jeb's batch parsing rules on StackOverflow
I think it's already in the correct order now.dbenham wrote: ↑25 Jan 2018 15:581) Should phases 3 and 4 be reversed?
The echoing of parsed commands (phase 3) occurs at two points: after the initial round of phase 2 (main parser), and then again after each round of phase 4 (FOR variable expansion for each DO iteration).
I think the logic would be much simpler to describe if the order of phases 3 and 4 were reversed. But I am reluctant to renumber the phases for fear of breaking phase references in historical posts.
Code: Select all
@echo off
setlocal
prompt #
echo on
FOR /L %%n in ( 1 1 3) DO (
echo %%n
)
For any command, first phase 3 occours (in this case the FOR main line will be echoed) and then the FOR-loop phase starts.Output wrote:#FOR /L %n in (1 1 2) DO (echo %n )
#(echo 1 )
1
#(echo 2 )
2
In each Loop (Phase4) the FOR-variables are expanded, then goto phase 3 and recheck the ECHO state, return to phase4 after all other phases are done.
I'm not sure if phase4 is still a "phase" as it stands a little bit outside of the normal phase order.
You proposed changes for phase7 looks better than my original text and they are much more extensive.
Re: Discussion about jeb's batch parsing rules on StackOverflow
I think you missed my point for issue 1.
With the current phase numbering
A normal command flows as follows:
0 -> 1 -> 2 -> 3 -> skip 4 -> 5 -> usually skip 6 -> 7
Phase 3 only executes if command block in previously executed phase 2 did not start with @
Phase 3 shows the results of phase 2
When a FOR command executes in 7, it kicks off the DO commands, starting with phase 4:
3
^
4 -> 5 -> usually skip 6 -> 7
Phase 4 must explicitly call phase 3 as a subroutine.
Phase 3 only executes if command block in previously executed (not skipped) phase 4 did not start with @
Phase 3 shows the results of phase 4
Phase 3 then returns to 4 before it flows to 5.
Or, in a linear layout, it flows as
4 -> 3 -> skip 4 -> 5 -> usually skip 6 -> 7
My proposed new phase numbering
A normal command flows as follows:
0 -> 1 -> 2 -> skip 3 -> 4 -> 5 -> usually skip 6 -> 7
Phase 4 only executes if command block in previously executed phase 2 did not start with @
Phase 4 shows the results of phase 2
No matter the order, a normal command always skips FOR expansion, so the order does not matter much
When a FOR command executes in 7, it kicks off the DO commands, starting with phase 3:
3 -> 4 -> 5 - usually skip 6 -> 7
Phase 3 simply flows naturally into phase 4
Phase 4 only executes if command block in previously executed phase 3 did not start with @ - a sensible order
Phase 4 shows the results of phase 3
----------------
Does my proposal make sense now? If starting from scratch, I would definitely use the modified numbering. But for historical reasons, I am reluctant.
Back to Issue 2
Well, my proposed rules were too simple.
I've come up with the following revised rules that seem to work for me on Win 7. I'll test soon on Win 10.
7.1 - Execute internal command - If the command token is quoted, then skip this step. Otherwise, attempt to parse out an internal command and execute.
7.3 - Execute external command - Else try to treat the command as an external command
7.4 - Ignore a label - Ignore the command and all its arguments if the command token begins with :
Rules in 7.2 and 7.3 may prevent a label from reaching this point.
I think the above rules for 7.1 are good, But they violate a rule that jeb posted at ECHO. FAILS to give text or blank line - Instead use ECHO/
The command fails with an error stating: 'echo.' is not recognized as an internal or external command, operable program or batch file.
This result is consistent with my proposed 7.1 rules - it is not recognized as an internal command, and eventually fails to execute as an external command.
But I cannot reproduce this behavior with ECHO[ ECHO] or ECHO+ on Windows 7. Update - I have confirmed Win 10 behaves the same as Win 7
If I create a file named ECHO[ and then execute the command ECHO[ then it successfully executes the internal ECHO command and prints a blank line.
The same is true with ] and +
If I could reproduce jeb's result, then it would invalidate my rules.
Did jeb get this wrong
Or does the behavior described by jeb only apply to Win XP
Or ...
Dave Benham
With the current phase numbering
A normal command flows as follows:
0 -> 1 -> 2 -> 3 -> skip 4 -> 5 -> usually skip 6 -> 7
Phase 3 only executes if command block in previously executed phase 2 did not start with @
Phase 3 shows the results of phase 2
When a FOR command executes in 7, it kicks off the DO commands, starting with phase 4:
3
^
4 -> 5 -> usually skip 6 -> 7
Phase 4 must explicitly call phase 3 as a subroutine.
Phase 3 only executes if command block in previously executed (not skipped) phase 4 did not start with @
Phase 3 shows the results of phase 4
Phase 3 then returns to 4 before it flows to 5.
Or, in a linear layout, it flows as
4 -> 3 -> skip 4 -> 5 -> usually skip 6 -> 7
My proposed new phase numbering
A normal command flows as follows:
0 -> 1 -> 2 -> skip 3 -> 4 -> 5 -> usually skip 6 -> 7
Phase 4 only executes if command block in previously executed phase 2 did not start with @
Phase 4 shows the results of phase 2
No matter the order, a normal command always skips FOR expansion, so the order does not matter much
When a FOR command executes in 7, it kicks off the DO commands, starting with phase 3:
3 -> 4 -> 5 - usually skip 6 -> 7
Phase 3 simply flows naturally into phase 4
Phase 4 only executes if command block in previously executed phase 3 did not start with @ - a sensible order
Phase 4 shows the results of phase 3
----------------
Does my proposal make sense now? If starting from scratch, I would definitely use the modified numbering. But for historical reasons, I am reluctant.
Back to Issue 2
Well, my proposed rules were too simple.
I've come up with the following revised rules that seem to work for me on Win 7. I'll test soon on Win 10.
7.1 - Execute internal command - If the command token is quoted, then skip this step. Otherwise, attempt to parse out an internal command and execute.
- The following tests are run to determine if an unquoted command token represents an internal command:
- If the command token exactly matches an internal command, then execute it.
- Else break the command token at the first occurrence of + ( / [ or ]
If the preceding text is an internal command, then execute it - Else break the original command token at the first occurrence of . \ or :
If the preceding text is not an internal command, then goto 7.2
Else the preceding text may be an internal command. Remember this command. - Break the original command token at the first occurrence of + ( / [ or ]
If the preceding text is a path to an existing file, then goto 7.2
Else execute the remembered command.
- Just because a command token is parsed as an internal command does not mean that it will execute successfully. Each internal command has its own rules as to what syntax is allowed.
- ...
7.3 - Execute external command - Else try to treat the command as an external command
7.4 - Ignore a label - Ignore the command and all its arguments if the command token begins with :
Rules in 7.2 and 7.3 may prevent a label from reaching this point.
I think the above rules for 7.1 are good, But they violate a rule that jeb posted at ECHO. FAILS to give text or blank line - Instead use ECHO/
I agree that ECHO. fails if a file named ECHO exists. Note that the trailing . is removed by the OSjeb wrote: These one fails, if files exists like echo*, the * is one of ".[]+'`~"Code: Select all
echo. echo[ echo] echo+
The command fails with an error stating: 'echo.' is not recognized as an internal or external command, operable program or batch file.
This result is consistent with my proposed 7.1 rules - it is not recognized as an internal command, and eventually fails to execute as an external command.
But I cannot reproduce this behavior with ECHO[ ECHO] or ECHO+ on Windows 7. Update - I have confirmed Win 10 behaves the same as Win 7
If I create a file named ECHO[ and then execute the command ECHO[ then it successfully executes the internal ECHO command and prints a blank line.
The same is true with ] and +
If I could reproduce jeb's result, then it would invalidate my rules.
Did jeb get this wrong
Or does the behavior described by jeb only apply to Win XP
Or ...
Dave Benham
Re: Discussion about jeb's batch parsing rules on StackOverflow
Argh
I just realized that there are critical differences in batch mode vs. command line (Today I'm testing on Win 10).
I'm still trying to figure things out, but my proposed rules are definitely not quite right.
I still haven't been able to reproduce failure of ECHO[ if the file ECHO[ exists. Not in batch mode or command line mode. It always echoes a blank line.
But if ECHO[.BAT exists, then ECHO[ in batch mode executes the batch file.
In command line ECHO[ still echoes a blank line.
I'm beginning to question whether I will ever figure this out. I'm thinking that I won't succeed unless I exactly nail down how phase 2 parses tokens. Currently that is a bit fuzzy.
Dave Benham
I just realized that there are critical differences in batch mode vs. command line (Today I'm testing on Win 10).
I'm still trying to figure things out, but my proposed rules are definitely not quite right.
I still haven't been able to reproduce failure of ECHO[ if the file ECHO[ exists. Not in batch mode or command line mode. It always echoes a blank line.
But if ECHO[.BAT exists, then ECHO[ in batch mode executes the batch file.
In command line ECHO[ still echoes a blank line.
I'm beginning to question whether I will ever figure this out. I'm thinking that I won't succeed unless I exactly nail down how phase 2 parses tokens. Currently that is a bit fuzzy.
Dave Benham
Re: Discussion about jeb's batch parsing rules on StackOverflow
Don't you think Microsoft would help out with this if you asked them nicely? (unless, of course, reverse engineering is the point)
Re: Discussion about jeb's batch parsing rules on StackOverflow
@penpen - Thanks for verifying that ECHO[ ECHO] and ECHO+ do not work properly on XP if a file with that name exists in the current directory.
Now could you (or anyone?) confirm that ECHO[ ECHO] and ECHO+ do work properly on Win 7, 8, and/or 10 if a file with that name exists in the current directory.
It looks like we have a difference in the parsing rules for XP vs later versions.
Dave Beham
Now could you (or anyone?) confirm that ECHO[ ECHO] and ECHO+ do work properly on Win 7, 8, and/or 10 if a file with that name exists in the current directory.
It looks like we have a difference in the parsing rules for XP vs later versions.
Dave Beham
Re: Discussion about jeb's batch parsing rules on StackOverflow
I get different results than penpen.
I retested it with winXP32 and Win7 and get the same results for both.
Code: Select all
@echo off
call :test "["
call :test "]"
call :test "+"
call :test "."
exit /b
:test
call :testExt "%~1" ""
call :testExt "%~1" ".bat"
exit /b
:testExt
call :__testExt "%~1" "%~2"
if "%OK%" == "0" echo Last test FAILED, for "ECHO%~1" with FILE "ECHO%~1%~2"
exit /b
:__testExt
set ok=0
set "char=%~1"
set "EXT=%~2"
del echo* 2> nul
del echo*.bat 2> nul
echo(
echo Testing "echo%CHAR%" with existing file "echo%CHAR%%EXT%"
echo ECHO THIS IS %%0 > "echo%CHAR%%EXT%"
(
echo%CHAR% #1 in block
)
for %%A in (1) DO (
echo%CHAR% #2 in for block
)
echo%CHAR% #3 plain
set ok=1
exit /B
My old statement was not quite correct, it should be:Testing "echo[" with existing file "echo["
#1 in block
#2 in for block
#3 plain
Testing "echo[" with existing file "echo[.bat"
#1 in block
#2 in for block
Das Sprungziel - __testExt wurde nicht gefunden.
Last test FAILED, for "ECHO[" with FILE "ECHO[.bat"
Testing "echo]" with existing file "echo]"
#1 in block
#2 in for block
#3 plain
Testing "echo]" with existing file "echo].bat"
#1 in block
#2 in for block
Das Sprungziel - __testExt wurde nicht gefunden.
Last test FAILED, for "ECHO]" with FILE "ECHO].bat"
Testing "echo+" with existing file "echo+"
#1 in block
#2 in for block
#3 plain
Testing "echo+" with existing file "echo+.bat"
#1 in block
#2 in for block
Das Sprungziel - __testExt wurde nicht gefunden.
Last test FAILED, for "ECHO+" with FILE "ECHO+.bat"
Testing "echo." with existing file "echo."
Der Befehl "echo." ist entweder falsch geschrieben ode
konnte nicht gefunden werden.
Der Befehl "echo." ist entweder falsch geschrieben ode
konnte nicht gefunden werden.
Der Befehl "echo." ist entweder falsch geschrieben ode
konnte nicht gefunden werden.
Testing "echo." with existing file "echo..bat"
#1 in block
#2 in for block
Das Sprungziel - __testExt wurde nicht gefunden.
Last test FAILED, for "ECHO." with FILE "ECHO..bat"
ECHO. fails, when a file "ECHO" (without extension) exists in the same directory (but when also a "ECHO.BAT" file exists, that file will be executed instead)
ECHO<?> searches and executes for a file named "ECHO<?>.bat", <?> is one character of the list ". [ ] +"
The search for the file only occours when the ECHO<?> is not inside a command block or the command for a FOR or IF command
This does not apply in command line context (echo. still fails for the point)
jeb
PS: Some more investigations are required
echo. will fail when both "echo." and "echo.bat" exist, but when "echo..bat" exists, that file will be executed
& && || operatores modifies the behaviour, it currently seems to disable the file search function for "echo<?>.bat"
Re: Discussion about jeb's batch parsing rules on StackOverflow
Thanks jeb. That is a relief that XP is not different than later versions.
But command concatenation, command blocks, FOR, and IF alter the behavior
I was about to post a set of phase 7 rules that I thought for sure accounted for all the behavior. But your new discovery blows me away I never thought to test for that.
I wonder if command blocks, concatenation, FOR and IF simply use command line search rules.
One critical thing I have discovered about phase 2 - A left paren ( functions as a token delimiter when parsing the command token
Dave Benham
But command concatenation, command blocks, FOR, and IF alter the behavior
I was about to post a set of phase 7 rules that I thought for sure accounted for all the behavior. But your new discovery blows me away I never thought to test for that.
I wonder if command blocks, concatenation, FOR and IF simply use command line search rules.
One critical thing I have discovered about phase 2 - A left paren ( functions as a token delimiter when parsing the command token
Dave Benham
Re: Discussion about jeb's batch parsing rules on StackOverflow
Yes, I know and I suppose I wrote something about that fact, as I assume that "echo(" got his special abillities from exactly there.
The next test works with "ECHO[" and the others, but with "ECHO(" it works only for percent expansion, therefore the splitting of "ECHO" and "(" have to be happen in phase2
Code: Select all
@echo off
setlocal EnableDelayedExpansion
set "myEcho=echo("
%myEcho% #1
!myEcho! #2
for /F "delims=" %%A in ("%myEcho%") do (
%%A #3
)
Code: Select all
@echo off
setlocal EnableDelayedExpansion
set "(var= PAREN"
echo!(var!
echo!^(var!
Output wrote:var
PAREN
My question is, why the hell is there any difference at all
I can't believe that this is intentionally, but what type of code would produce such a behaviour?
jeb
Re: Discussion about jeb's batch parsing rules on StackOverflow
OK - Here are my proposed rules for how it is determined if a command is an internal command. They account for all the test results I have seen, but I haven't tested all possible permutations.
There are 4 aspects of phase 2 that are critical to understanding my phase 7 rules:
Rules in 7.2 and 7.3 may prevent a label from reaching this point
Dave Benham
There are 4 aspects of phase 2 that are critical to understanding my phase 7 rules:
- ( functions as a token delimiter when parsing the command token
- Token delimiters preceding the command token are stripped
- Escaped token delimiters can be included in the command token
- All token delimiters after the command token are preserved in the argument list for a command when it is passed to phase 7
- The following tests are made to determine if an unquoted command token represents an internal command
- If the command token exactly matches an internal command, then execute it.
. - Else break the command token before the first occurrence of + / [ ] or standard token delimiter
If the preceding text is an internal command, then remember that command- If in command line mode, or if the command is from a parenthesized block, IF command block, FOR command block, or involved with command concatenation, then execute the internal command
. - Else (must be a stand-alone command in batch mode) scan the current folder and the PATH for a .COM, .EXE, .BAT, or .CMD file whose base name matches the original command token
- If the first matching file is a .BAT or .CMD, then goto 7.3.exec and execute that script
- Else (match not found or first match is .EXE or .COM) execute the remembered internal command
- If in command line mode, or if the command is from a parenthesized block, IF command block, FOR command block, or involved with command concatenation, then execute the internal command
- Else break the command token before the first occurrence of . \ or :
If the preceding text is not an internal command, then goto 7.2
Else the preceding text may be an internal command. Remember this command.
. - Break the command token before the first occurrence of + / [ ] or standard token delimiter
If the preceding text is a path to an existing file, then goto 7.2
Else execute the remembered command
- If the command token exactly matches an internal command, then execute it.
- If an internal command is parsed from a larger command token, then the unused portion of the command token is included in the argument list
. - Note that ( does not have any special meaning in phase 7 - it is not a standard token delimiter
. - Just because a command token is parsed as an internal command does not mean that it will execute successfully. Each internal command has its own rules as to what syntax is allowed
- Details skipped for now
- Details about error detection and label detection skipped for now
- 7.3.exec - Execute the external command.
Rules in 7.2 and 7.3 may prevent a label from reaching this point
Dave Benham
Re: Discussion about jeb's batch parsing rules on StackOverflow
I just reread this post, and realized that ( does not function as a command token delimiter for the first command after an unexecuted label within a parenthesized block
Also, A parenthesized block cannot follow immediately after an unexecuted label within a parenthesized block.
It seems likely that the parenthesized block parser implementation is what causes ( to function as a command token delimiter, which I suspect is an unintended side effect.
Dave Benham
Code: Select all
(
:UnexecutedLabel
echo(1 FAILS & echo(2 OK
echo(3 OK
:UnexecutedLabel
:ExecutedLabel
echo(4 OK
)
Code: Select all
(
:UnexecutedLabel
(echo 1 FAILS) & (echo 2 OK)
(echo 3 OK)
:UnexecutedLabel
:ExecutedLabel
(echo 4 OK)
)
Dave Benham
Re: Discussion about jeb's batch parsing rules on StackOverflow
Well, the extra test deals with discovery of a batch file (not any other type of external file), and this is often needed when using CALL. There is no obvious reason to ever CALL a batch file from the command line. (Yes there are some hacky issues that could make a command line CALL useful, but I doubt such uses were planned for)
So I'm guessing that the odd extra batch test is related to the CALL mechanism, and there are some unintended side effects that control when the test is performed, and when it is not.
Dave Benham
Re: Discussion about jeb's batch parsing rules on StackOverflow
I've incorporated the revised phase 7 rules into the SO post, along with a number of additional changes.
The last major addition that I want to do is add a new answer to the SO question that collects all rules about labels into one place. Some of the information will be redundant with info in the main answer. But the rules about how GOTO and CALL parse labels will be new. Once I finish this, I will add a reference to label answer within the main answer.
Another possible refinement is to flesh out the rules how external commands are identified (involving current directory, PATH, PATHEXT, and file associations, ...). I'm not yet committed to doing this, but I think it would be really useful.
The last major project I can think of would be to investigate and document the phase 7 option and argument parsing rules for each internal command. But I seriously doubt I will ever undertake this effort.
Dave Benham
The last major addition that I want to do is add a new answer to the SO question that collects all rules about labels into one place. Some of the information will be redundant with info in the main answer. But the rules about how GOTO and CALL parse labels will be new. Once I finish this, I will add a reference to label answer within the main answer.
Another possible refinement is to flesh out the rules how external commands are identified (involving current directory, PATH, PATHEXT, and file associations, ...). I'm not yet committed to doing this, but I think it would be really useful.
The last major project I can think of would be to investigate and document the phase 7 option and argument parsing rules for each internal command. But I seriously doubt I will ever undertake this effort.
Dave Benham