Call and goto may fail when the batch file has Unix line endings
Posted: 06 Feb 2019 10:15
Hi,
I had a crazy problem this morning after doing a minor fix of my AutoRun.cmd script for XP.
The fix was working fine. Before committing the updated version, I added a comment in the file header. One last test running 'AutoRun -l' and... BOOM
Of course, I first thought that AutoRun.cmd itself was the culprit, as it's a new and relatively untested program. The problem had to come from one of the AutoRun scripts, like the pid script I had added late yesterday evening.
But removing the AutoRun.cmd.d directory did not help. Uninstalling AutoRun.cmd with 'AutoRun.cmd -u' (using the latest version that still worked!) did not help either.
So I started removing the changes between the last version in GitHub that worked, and the broken version... Until I removed ALL code changes, but 'AutoRun -l' still exploded
The only remaining thing that still differed between the working and failing versions was the comment in the header. I removed it, and suddenly 'AutoRun -l' worked fine again (on Windows 10, but not on XP!).
I reintroduced the code changes for XP. Everything still worked fine on both systems.
Adding back the comment reintroduced the problem.
Then things became really crazy: I retyped the comment in a copy of the old script that worked, and the updated version worked fine.
So I now had two scripts that both fc.exe and WinMerge told me were identical, one that worked, and one that did not!!!!
Then I did a binary comparison, and found the difference:
The failing version had TABs in the fateful comment, whereas the working version only had spaces.
Trying to add or remove spaces in the header comment, I noticed that the first function call ended up in the middle of another line!
And the exact position it arrived at depended on the number of spaces added in the comment, hence the inconsistent error messages I had had until then.
By chance, I eventually noticed that my script had incorrect line endings, with Unix LFs instead of Windows normal CRLFs.
Searching on the Internet, I found a number of people reporting similar symptoms for batch files having Unix line endings.
Changing the line endings indeed fixed the problem for all versions of AutoRun.cmd, whatever the number of spaces in the header comments.
So, good for AutoRun.cmd, I quickly pushed on GitHub an updated version with the right line endings.
Still, I think we have here an opportunity to investigate this issue further.
Contrary to what other people have reported on the Internet, a batch file may work perfectly well with Unix line endings. My AutoRun.cmd script worked fine for days on several machines, running various versions of Windows from 32-bits XP to 64-bits Windows 10.
I'm attaching below the most stripped down version I have now, that still exhibits the problem.
The remaining code is very simple: It sets a few variables, then jumps to the :list function, which in turns call twice the :query function.
Extract AutoRun3.cmd from the attached zip file, and run it in a new cmd.exe window without any argument. (Don't do it in a cmd.exe window with important information displayed, as this may blow up that window!) This displays error:
Yet the :list label is present in the file!
If you carefully append one space to one of the header comment lines, you'll see a very different error message:
So this time, it finds the :list label, but it's the call to the :query routine that ends up in the middle of the comment line preceding the :query label!
The challenge now is to further simplify the script while still seeing the problem... Until maybe we can understand the root cause.
And maybe we can even use that knowledge to do crazier things
I had a crazy problem this morning after doing a minor fix of my AutoRun.cmd script for XP.
The fix was working fine. Before committing the updated version, I added a comment in the file header. One last test running 'AutoRun -l' and... BOOM
Of course, I first thought that AutoRun.cmd itself was the culprit, as it's a new and relatively untested program. The problem had to come from one of the AutoRun scripts, like the pid script I had added late yesterday evening.
But removing the AutoRun.cmd.d directory did not help. Uninstalling AutoRun.cmd with 'AutoRun.cmd -u' (using the latest version that still worked!) did not help either.
So I started removing the changes between the last version in GitHub that worked, and the broken version... Until I removed ALL code changes, but 'AutoRun -l' still exploded
The only remaining thing that still differed between the working and failing versions was the comment in the header. I removed it, and suddenly 'AutoRun -l' worked fine again (on Windows 10, but not on XP!).
I reintroduced the code changes for XP. Everything still worked fine on both systems.
Adding back the comment reintroduced the problem.
Then things became really crazy: I retyped the comment in a copy of the old script that worked, and the updated version worked fine.
So I now had two scripts that both fc.exe and WinMerge told me were identical, one that worked, and one that did not!!!!
Then I did a binary comparison, and found the difference:
The failing version had TABs in the fateful comment, whereas the working version only had spaces.
Trying to add or remove spaces in the header comment, I noticed that the first function call ended up in the middle of another line!
And the exact position it arrived at depended on the number of spaces added in the comment, hence the inconsistent error messages I had had until then.
By chance, I eventually noticed that my script had incorrect line endings, with Unix LFs instead of Windows normal CRLFs.
Searching on the Internet, I found a number of people reporting similar symptoms for batch files having Unix line endings.
Changing the line endings indeed fixed the problem for all versions of AutoRun.cmd, whatever the number of spaces in the header comments.
So, good for AutoRun.cmd, I quickly pushed on GitHub an updated version with the right line endings.
Still, I think we have here an opportunity to investigate this issue further.
Contrary to what other people have reported on the Internet, a batch file may work perfectly well with Unix line endings. My AutoRun.cmd script worked fine for days on several machines, running various versions of Windows from 32-bits XP to 64-bits Windows 10.
I'm attaching below the most stripped down version I have now, that still exhibits the problem.
The remaining code is very simple: It sets a few variables, then jumps to the :list function, which in turns call twice the :query function.
Extract AutoRun3.cmd from the attached zip file, and run it in a new cmd.exe window without any argument. (Don't do it in a cmd.exe window with important information displayed, as this may blow up that window!) This displays error:
Code: Select all
C:\JFL\Temp>autorun3
The system cannot find the batch label specified - list
C:\JFL\Temp>
If you carefully append one space to one of the header comment lines, you'll see a very different error message:
Code: Select all
C:\JFL\Temp>autorun3
AutoRun scripts:
C:\JFL\Temp>for %h in (HKLM HKCU) do call :query %h
C:\JFL\Temp>call :query HKLM
C:\JFL\Temp>putVarName
'putVarName' is not recognized as an internal or external command,
operable program or batch file.
...
The challenge now is to further simplify the script while still seeing the problem... Until maybe we can understand the root cause.
And maybe we can even use that knowledge to do crazier things