Bug/Mystery in the phase parsing rules 1.5 and 2 CR vs redirect

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Bug/Mystery in the phase parsing rules 1.5 and 2 CR vs redirect

#16 Post by dbenham » 31 Jan 2020 20:20

Very interesting. Your discovery about ^ before redirection is definitely important, and unsettling :evil:

I've also done a bunch of tests with some annotation in the code. It will be hard to read unless you copy the code into your own text editor.
Some of the tests aren't so much about token dropping, but rather demonstrate some of the rules in my earlier post.
But the majority investigate token dropping. The analogs to your recent tests begin at #14.
It is odd how the parser seems to have a hard time deciding if the subsequent redirection is escaped or not.

Unfortunately my test #10 disproves my theory that the token dropping is due to the redirection parser blindly parsing the next token - that token dropping requires 2 tokens after the redirection token :(

Code: Select all

@echo off

:: Handle line continuation with fatal syntax error without crashing out
if "%~1" neq "" (
  echo on
  call :%1
  exit /b %= exit really shouldn't be needed =%
)

setlocal
prompt $G

echo on
cls

:: Basic token dropping ceases after token delimiter found after next token
echo #1 Hello >con abc^
xyz^
 123^
world! ^
Nice^
 to see you.

:: Basic token dropping - next token does not have to start on first line
echo #2 Hello >con ^
 xyz^
world! Nice^
 to see you.

:: Token dropping continuation does not escape next character
echo #3 Hello >con abc^
&echo world

:: Redirection token must be complete before token dropping begins
echo #4 Hello >^
c^
o^
n ^
abc^
world!

:: Next token begins immediately after &n, no delimiter needed :(
:: Beyond that, "normal" token dropping.
echo #5 Hello >&2abc ^
123^
world!

:: No token dropping here because we already have token delim after next token
echo #6 Hello >&2world! ^
Nice to see you.

:: &n does not work with token delimiter in front
cmd /c "echo #7 Hello > &2 world"

:: ^&n works, subsequent characters in same token ignored, but in ECHO output
echo #8 Hello> ^&2abc world

:: Doesn't need to be a new token for ^&n to ignore subsequent chars
echo #9 Hello>^&2abc world

:: Thankfully this fails
cmd /c "echo #10 Hello >&bad world"

:: Damn! This still has token dropping :(
:: So not as simple as parser blindly grabbing next token.
:: Rather parser wants next token after destination found.
:: WHY?
echo #11 Hello > ^&2 abc^
world

:: Still "normal" token dropping
echo #12 Hello >con abc^
<nul abc^
world

:: No token actually dropped with this line continuation before finding
:: a token after the redirection, but the token drop behavior of not escaping
:: the next character is still in play
echo #13 Hello >con ^
&echo world


cmd /c "%~f0" :14 & goto :skip
:14
:: This is the beginning of total weirdness. Same as 13 except substitute
:: redirection for concatenation. But now the original redirection is lost
:: and the next token is used, with the 2nd redirection escaped! HUH?
echo #14 Hello world >dropped ^
<nul

:skip

:: Same as test 14, except add a 2nd token on the continued line.
:: Now the 1st destination returns, the 2nd redirection is no longer
:: escaped, but the destination in that token is ignored, and the next
:: token is used instead. WTF!
echo #15 Hello world >con ^
<dropped nul


:: The whacky destination substitution can be continued onto the next line
echo #16 Hello>con ^
<dropped ^
nul world

:: Surprising how many token delimiters are dropped from the output
echo #17 Hello>con ^
<dropped nul ^
 world

:: Escaped token delimiter can be part of destination
echo #18 Hello>con ^
<dropped nul^
 destinationContinued world

:: Same as 18 except ^&n still not bothered by extra chars
echo #19 Hello>con 4<nul ^
<dropped ^&4^
 destinationContinued world

:: No difference between input vs output
echo #20 Hello 4>nul ^
>dropped con ^
world
--OUTPUT--

Code: Select all

>echo #1 Hello world! Nice to see you. 1>con
#1 Hello world! Nice to see you.

>echo #2 Hello world! Nice to see you. 1>con
#2 Hello world! Nice to see you.

>echo #3 Hello  1>con  & echo world
#3 Hello
world

>echo #4 Hello world! 1>con
#4 Hello world!

>echo #5 Hello abc 123world! 1>&2
#5 Hello abc 123world!

>echo #6 Hello world! Nice to see you. 1>&2
#6 Hello world! Nice to see you.

>cmd /c "echo #7 Hello > &2 world"
& was unexpected at this time.

>echo #8 Hello world 1>&2abc
#8 Hello world

>echo #9 Hello world 1>&2abc
#9 Hello world

>cmd /c "echo #10 Hello >&bad world"
>& was unexpected at this time.

>echo #11 Hello world 1>&2
#11 Hello world

>echo #12 Hello world 1>con 0<nul
#12 Hello world

>echo #13 Hello  1>con  & echo world
#13 Hello
world

>cmd /c "C:\test\test.bat" :14   & goto :skip
The syntax of the command is incorrect.
><nul

>echo #15 Hello world  1>con 0<nul
#15 Hello world

>echo #16 Hello world 1>con 0<nul
#16 Hello world

>echo #17 Hello world 1>con 0<nul
#17 Hello world

>echo #18 Hello world 1>con 0<nul destinationContinued
The system cannot find the file specified.

>echo #19 Hello world 1>con 4<nul 0<&4 destinationContinued
#19 Hello world

>echo #20 Hello world 4>nul 1>con
#20 Hello world

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Bug/Mystery in the phase parsing rules 1.5 and 2 CR vs redirect

#17 Post by dbenham » 02 Feb 2020 10:14

penpen wrote:
31 Jan 2020 18:45
I suspect that the "^<nul cmdToken" part somehow recognizes the redirection "<nul" removing that part from the input buffer(/string) and then the '^'-character doubles the '<'-character, so that "< cmdToken" overwrites the previous result.
Interesting theory - a stuttering parser. And plausible given the evidence so far.

But now I have extended your example and made it even weirder.

Code: Select all

@echo off
setlocal

:: Handle line continuation with fatal syntax error without crashing out
if "%~1" neq "" (
  echo on
  call :%1
  exit /b %= exit really shouldn't be needed =%
)

prompt $G
cls
echo on

:: Reproduce penpen's single line token drop.
:: Redirection followed by what should be escaped redirection
:: but in reality 2nd redirection active, but concatenated
:: destination replaced by subsequent token
echo #1 Hello >con ^<dropped nul world

@echo( & cmd /c "%~f0" :2 & goto :skip
:2
:: Throw a 3rd redirection into the mix, and now not a single
:: redirection is completed. Note how there is no line reorginaztion
:: nor file handle inserted in echo output.
echo #2 Hello >Incomplete ^<fail 4>fail

:skip
@echo( & @cmd /c "%~f0" :3 & goto :skip
:3
:: Same result. Don't have to change the file handles or operator
echo #3 Hello <aaa ^<bbb <ccc world

:skip
--OUTPUT--

Code: Select all

>echo #1 Hello  world 1>con 0<nul
#1 Hello  world

4> was unexpected at this time.
>echo #2 Hello >Incomplete ^<fail 4>fail

< was unexpected at this time.
>echo #3 Hello <aaa ^<bbb <ccc world
Tests #2 and #3 result in a syntax error. But more importantly, the echo output has no line reorganization nor any file handle inserted before the redirection symbols.
So adding a 3rd redirection causes all redirection parsing to fail entirely.

This tells me the penpen stutter in #1 is likely an artifact of some more fundamental issue.

At this point I'm nearly at a loss. I feel like we will continue to find more and more bizarre behavior as we throw more at the redirection parser.

The only thing that seems "clear" to me is that the redirection parser looks at additional tokens after the redirection is logically complete in our eyes. We have no idea why this is happening.
If in those subsequent tokens there is an escaped redirection, then the parser begins a new redirection before the prior one is complete, and gets very confused.


Dave Benham

jeb
Expert
Posts: 1055
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

Re: Bug/Mystery in the phase parsing rules 1.5 and 2 CR vs redirect

#18 Post by jeb » 02 Feb 2020 11:57

dbenham wrote:
02 Feb 2020 10:14
Tests #2 and #3 result in a syntax error. But more importantly, the echo output has no line reorganization nor any file handle inserted before the redirection symbols.
So adding a 3rd redirection causes all redirection parsing to fail entirely.
I suppose, you tricked yourself.
Your result looks strange, but the conclusion is probably wrong.
You only need to add a single word.
You triggered the "remove token" mechanism without a newline, I didn't know that this is possible, or at least I forget it :)

Code: Select all

echo #2 Hello >Incomplete ^<fail 4>fail
echo #2 Hello >Incomplete ^<fail NEWWORD 4>fail

Code: Select all

@set "prompt=$G "
echo #1 One 1>&2 ^<REMOVED2 NUL 
echo #2 One      ^<VISBLE_1 VIS
echo #3 Hello 2>Incomplete ^<fail NUL 4>fail

Code: Select all

> echo #1 One   1>&2 0<NUL 
#1 One  

> echo #2 One      <VISBLE_1 VIS 
#2 One      <VISBLE_1 VIS

> echo #3 Hello  2>Incomplete 0<NUL 4>fail 
#3 Hello 

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Bug/Mystery in the phase parsing rules 1.5 and 2 CR vs redirect

#19 Post by dbenham » 02 Feb 2020 12:57

jeb wrote:
02 Feb 2020 11:57
You triggered the "remove token" mechanism without a newline, I didn't know that this is possible, or at least I forget it :)
My test #1 is simply a reformat of one of penpen's tests - who in turn claims he got it from you at viewtopic.php?p=32687#p32687. :lol:
jeb wrote:
02 Feb 2020 11:57
dbenham wrote:
02 Feb 2020 10:14
Tests #2 and #3 result in a syntax error. But more importantly, the echo output has no line reorganization nor any file handle inserted before the redirection symbols.
So adding a 3rd redirection causes all redirection parsing to fail entirely.
I suppose, you tricked yourself.
Your result looks strange, but the conclusion is probably wrong.
You only need to add a single word.
Yes, Your result #3 is the result I expect based on the results of prior experiments. It is basically the same as my test #1 except it has an extra redirection tacked on.

I think you missed the point I was trying to make. I didn't mean to imply that you cannot have 3 redirections if the middle one is escaped. My intent was to show what happens when you have 3 redirections without any intervening tokens, as in "<one ^<two <three". Sure the statement fails, but look what happens to the phase 3 echo output. In my test #1, as well as all prior "token dropping" tests, the first redirection was successfully parsed, so it appears at the end with the file handle inserted. But in my tests #2 and #3 the third phase echoes the original line without any manipulation. This tells me that not even the first redirection was fully parsed :shock:

My test #2 is not as obvious because I forgot to add the " world" at the end. But it is very obvious in test #3.


Dave Benham

jeb
Expert
Posts: 1055
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

Re: Bug/Mystery in the phase parsing rules 1.5 and 2 CR vs redirect

#20 Post by jeb » 02 Feb 2020 15:48

dbenham wrote:
02 Feb 2020 12:57
jeb wrote: ↑
Sun Feb 02, 2020 6:57 pm
You triggered the "remove token" mechanism without a newline, I didn't know that this is possible, or at least I forget it :)

My test #1 is simply a reformat of one of penpen's tests - who in turn claims he got it from you at viewtopic.php?p=32687#p32687.
That's the good side of forgetting so much, I can be happy to discover the same things multiple times without even knowing it :D
dbenham wrote:
02 Feb 2020 12:57
I think you missed the point I was trying to make. I didn't mean to imply that you cannot have 3 redirections if the middle one is escaped. My intent was to show what happens when you have 3 redirections without any intervening tokens, as in "<one ^<two <three". Sure the statement fails, but look what happens to the phase 3 echo output. In my test #1, as well as all prior "token dropping" tests, the first redirection was successfully parsed, so it appears at the end with the file handle inserted. But in my tests #2 and #3 the third phase echoes the original line without any manipulation. This tells me that not even the first redirection was fully parsed
You're absolutely right, I missed your point completely :oops:

But still you are wrong 8) , there is no strange behavior of the parser.
What you see is the special output of any syntax error. (Obviously only when the error results into output the line anyway)
When a syntax error occurs and the line is shown, then the unmodified line after phase 1 is shown :!:
There you can see single carets that should disappeared, and even CR's are visible.
That's the key for my alternative technique for How to receive even the strangest command line parameters?

Code: Select all

@echo off
@set "prompt=$G "

set "var=single ^^ caret"
for /f "skip=1" %%C in ('"echo(| replace ? . /w /u"') do set "$CR=%%C"

call :test %%var%%
exit /b

:test
echo on
REM %*
echo #2 Hello >Incomplete ^<fail 4>fail asdf %* %$CR%###

Code: Select all

> REM single ^ caret 
"4>" kann syntaktisch an dieser Stelle nicht verarbeitet werden.

###cho #2 Hello >Incomplete ^<fail 4>fail asdf single ^ caret 
Btw. I think I remember a post some time ago from one of the experts about the disappearing tokens.
He had used a debugger and said something about buffer pointer resets after multiline carets.
But I cant remember the details nor can I find the post :(

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Bug/Mystery in the phase parsing rules 1.5 and 2 CR vs redirect

#21 Post by dbenham » 02 Feb 2020 16:47

jeb wrote:
02 Feb 2020 15:48
dbenham wrote:
02 Feb 2020 12:57
I think you missed the point I was trying to make. I didn't mean to imply that you cannot have 3 redirections if the middle one is escaped. My intent was to show what happens when you have 3 redirections without any intervening tokens, as in "<one ^<two <three". Sure the statement fails, but look what happens to the phase 3 echo output. In my test #1, as well as all prior "token dropping" tests, the first redirection was successfully parsed, so it appears at the end with the file handle inserted. But in my tests #2 and #3 the third phase echoes the original line without any manipulation. This tells me that not even the first redirection was fully parsed
You're absolutely right, I missed your point completely :oops:

But still you are wrong 8) , there is no strange behavior of the parser.
What you see is the special output of any syntax error. (Obviously only when the error results into output the line anyway)
When a syntax error occurs and the line is shown, then the unmodified line after phase 1 is shown :!:
There you can see single carets that should disappeared, and even CR's are visible.
That's the key for my alternative technique for How to receive even the strangest command line parameters?
Of course. Thanks. I failed to think about the difference between a syntax error and redirection failure.

In all the previous token dropping tests the line was successfully parsed in its entirety through phase 2, the parsed line is echoed in phase 3, and then the redirection execution fails in phase 5.5.

But my last tests have a syntax error in phase 2, and indeed I forgot that phase 3 echos the "raw" phase 1 output in that case.
So the token dropping results in ECHO HELLO <AAA ^<BBB <CCC WORLD being parsed as ECHO HELLO <AAA < <CCC WORLD. The 2nd redirection is expecting a destination, but receives an unescaped <, so of course that is a syntax error. The same result can be achieved simply without token dropping by ECHO HELLO < <CCC WORLD.

I feel much better now that we are back to "run-of-the-mill" insanity.


Dave Benham

penpen
Expert
Posts: 2009
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Bug/Mystery in the phase parsing rules 1.5 and 2 CR vs redirect

#22 Post by penpen » 02 Feb 2020 18:14

jeb wrote:
02 Feb 2020 15:48
Btw. I think I remember a post some time ago from one of the experts about the disappearing tokens.
He had used a debugger and said something about buffer pointer resets after multiline carets.
But I cant remember the details nor can I find the post :(
I thought i found something new when my test.bat ...

Code: Select all

@set "prompt=#"
echo 1234567890 <aaa ^<bb^
b <ccc
.. results in:

Code: Select all

"<" kann syntaktisch an dieser Stelle nicht verarbeitet werden.

#b <ccc
But sadly after a day of testing i only showed that the buffer pointer was reset after a multiline caret... frustrating... .

I'm not sure if i lost track about what left to do - am i right to say the following(?):
The three-redirection-mess and the multiline token drop are covered, but we still have to find out how exactly the one liner ("echo >con ^<nul cmdToken and some params") drops the nul.

penpen

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Bug/Mystery in the phase parsing rules 1.5 and 2 CR vs redirect

#23 Post by dbenham » 02 Feb 2020 22:52

I don't see any real difference between the single line vs multi-line forms of this destination dropping.

Code: Select all

@echo off
setlocal
prompt $G
echo  on

echo #1 >con ^<dropped nul abc

echo #2 >con ^
<dropped nul abc
--OUTPUT--

Code: Select all

>echo #1  abc 123 1>con 0<nul
#1  abc 123

>echo #2  abc 123 1>con 0<nul
#2  abc 123
Neither one makes any sense to me, but at least it seems predictable.

Dave Benham

penpen
Expert
Posts: 2009
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Bug/Mystery in the phase parsing rules 1.5 and 2 CR vs redirect

#24 Post by penpen » 03 Feb 2020 11:10

You could also drop tokens if you escape a leading redirection:

Code: Select all

^<skipped nul echo OK.
And just for the record, two results from yesterday (related to redirection; win version 10.0.18362.592):

Code: Select all

@set "prompt=#"
@cls

"<com" nul echo Test 1: Expected a different error, see tests 2 and 3.
:: funny reason:
"<com"
"<con" nul echo Test 2: Expected error, OK.
"<lpt" nul echo Test 3: Expected error, OK.

^<"con" "nul" set /p "=Test 4: Dropped token, device "nul" == nul, OK."
@echo(

@set "prompt="
@goto :eof
Result:

Code: Select all

#"<com" nul echo Test 1: Expected a different error, see tests 2 and 3.
Parameterformat falsch - nul

#"<com"
Aktive Codepage: 850.

#"<con" nul echo Test 2: Expected error, OK.
Der Befehl ""<con"" ist entweder falsch geschrieben oder
konnte nicht gefunden werden.

#"<lpt" nul echo Test 3: Expected error, OK.
Der Befehl ""<lpt"" ist entweder falsch geschrieben oder
konnte nicht gefunden werden.

#set /p "=Test 4: Dropped token, device "nul" == nul, OK." 0<"nul"
Test 5: Dropped token, device "nul" == nul, OK.
penpen

jeb
Expert
Posts: 1055
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

Re: Bug/Mystery in the phase parsing rules 1.5 and 2 CR vs redirect

#25 Post by jeb » 03 Feb 2020 13:25

What the hell?

Where is the relation between "<com" and chcp?

Code: Select all

"<com" /?
Wechselt die aktuelle Codepage oder zeigt deren Nummer an.

CHCP [nnn]

nnn    Die Nummer einer Codepage.

Der Befehl CHCP ohne Parameter zeigt die Nummer der aktuellen Codepage an.
But I hope this doesn't depends with the redirection parser.
Perhaps it's only a strange side effect of the "<" wildcard mechanism (Dave has written about that)

Code: Select all

cd c:\windows\system32
dir "<com"

Verzeichnis von c:\Windows\System32

14.07.2009  00:25            12.800 chcp.com
12.04.2011  08:43    <DIR>          com
14.07.2009  00:25            15.360 diskcomp.com
14.07.2009  00:25            12.800 diskcopy.com
14.07.2009  00:25            34.304 format.com
14.07.2009  00:25            30.208 mode.com
14.07.2009  00:25            24.576 more.com
14.07.2009  00:25            18.944 tree.com

penpen
Expert
Posts: 2009
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Bug/Mystery in the phase parsing rules 1.5 and 2 CR vs redirect

#26 Post by penpen » 03 Feb 2020 13:53

jeb wrote:
03 Feb 2020 13:25
But I hope this doesn't depends with the redirection parser.
Perhaps it's only a strange side effect of the "<" wildcard mechanism (Dave has written about that)
Yes, it's the wildcard mechanism, "<com" is just the first .com file in path (like "finds<<").

Maybe i used "related to redirection" a bit too loosely:
I tested if redirection could be done when using doublequoted (partial/full) redirections (with answer: No).


penpen

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Bug/Mystery in the phase parsing rules 1.5 and 2 CR vs redirect

#27 Post by dbenham » 03 Feb 2020 17:04

Yep - the "well documented" non-standard wildcards :wink:

You have to be careful with that "<xxxxx" construct 'cause you never know what you might get. :evil:

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Bug/Mystery in the phase parsing rules 1.5 and 2 CR vs redirect

#28 Post by dbenham » 03 Feb 2020 22:38

penpen wrote:
03 Feb 2020 11:10
You could also drop tokens if you escape a leading redirection:

Code: Select all

^<skipped nul echo OK.
Ooooh, that is interesting.

It is interesting that ECHO ^<NUL HELLO fully escapes the redirection (it becomes a literal), but ^<NUL ECHO HELLO does not. The only difference of course is the latter occurs before the command token.

Up until now I thought the destination dropping was a special case of token dropping. But now I'm not so sure. I suspect there may be fundamentally different mechanisms at play.

The full token dropping only occurs after redirection with line continuation. The token dropping can be chained through multiple line continuations.

The destination dropping always occurs whenever a redirection is escaped at the beginning of a line (before the command token is discovered) or immediately after a normal redirection. The escape could be on the same line, or the prior line with line continuation. The destination dropping cannot be chained - the next token always replaces what should be the destination.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Bug/Mystery in the phase parsing rules 1.5 and 2 CR vs redirect

#29 Post by dbenham » 04 Feb 2020 14:37

I found a very interesting case of token dropping that involves a single redirection without any escape character or line continuation. It only occurs on the last line of a batch script if the line is not terminated with a linefeed.

Code: Select all

@echo off
setlocal
prompt $G
echo on
echo Hello >con world_No_line_feed_at_the_end_of_this_line
--OUTPUT--

Code: Select all

>echo Hello  1>con
Hello
Like other token dropping situations, the drop may be continued as long as there does not exist an unescaped/unquoted token delimiter after the beginning of the trailing token.
The following gives the same result as above

Code: Select all

echo Hello >con dropped^
dropped^
dropped_no_LF_at_end_of_line
Escaped token delimiters can be part of the dropped token

Code: Select all

echo Hello >con drop^ ped_No_LF_at_end_of_line
But the following has two tokens, the first being the escaped space. So there is no token dropping in this case.

Code: Select all

echo Hello >con ^  world_No_LF_at_end_of_line
--OUTPUT--

Code: Select all

Hello    world_No_LF_at_end_of_line
When I get a chance I think I will try to write some updated rules that capture the behaviors we have seen so far. It makes no sense, but I think I can predict the results.

Post Reply