Page 1 of 3

Alternate method to get TAB, Carriage return and possibly all others

Posted: 21 Apr 2018 12:48
by sst
As you may all know, the contents of every 32 and 64 bit windows PE Image(Executables, DLLs, OCX,...) starts with a tiny DOS stub header known as MZ header which have enough logic to display the message "This program cannot be run in DOS mode." if that executable get launched from real DOS mode, and that stub header in turn has a pointer (at offset 0x3C) to the PE header of the executable so windows can recognize and load the actual image.

From near 25 years ago that Microsoft released its first 32-bit OS with windows NT 3.1, that stub header didn't change a bit, and today with the latest version of Windows 10 that MZ header is bit by bit the same as it was 25 years ago(with the exception of PE pointer at offset 0x3C of course, which can be different across executables not just windows versions).

In fact, windows itself has nothing to do with that MZ header and it is the output of the Microsoft Linker in its compilers toolset. That header has not changed along this many years and there is no reason for Microsoft to change that in the future

So this opens the opportunity to extract the useful bytes from that header in a way that is consistent across all versions of windows (At least from Win2K) and produce the same and predictable result.

For start let's take a look at the contents of this header

Image

The most interesting ones are those marked in red, the rest are not generally so useful by themselves unless we have at least other extended or control characters as well.
So we have TAB at offset 0x46, 0xFF delimiter at offsets 0x13 ,0x14 and two consecutive Carriage returns at offset 0x75 which makes it possible to grab the first CR.

We can grab TAB from the header with this code:
EDIT: Revised code (May/10/2018:)

Code: Select all

((for /L %%P in (1,1,70) do pause>nul)&set /p "TAB=")<"%COMSPEC%"
set "TAB=%TAB:~0,1%"
Old code for reference:

Code: Select all

set "TAB="
for /F "delims=" %%Z in (
    '^(type "%COMSPEC%"^|^(^(for /L %%P in ^(1^,1^,70^) do @pause^>nul^)^&set /p "TAB="^&call echo^,%%TAB%%^)^)2^>nul'
) do set "TAB=%%Z"
set "TAB=%TAB:~0,1%"
Grabbing 0xFF (nbsp) is similar (Revised)

Code: Select all

((for /L %%P in (1,1,12) do pause>nul)&set /p "NBSP=")<"%COMSPEC%"
set "NBSP=%NBSP:~0,1%"

And for CR

Code: Select all

for /F "tokens=1* delims=." %%Y in (
    'type ^"%COMSPEC%^"^|^(^(for /L %%P in ^(1^,1^,78^) do @pause^>nul^)^&2^>nul ^"%SystemRoot%\system32\findstr.exe^" /B /C:^"This program cannot be run in DOS mode.^"^)'
) do set "CR=%%Z"
with DelayedExpansion it will be easier to read:

Code: Select all

setlocal EnableDelayedExpansion
set "q=""
for /F "tokens=1* delims=." %%Y in (
    '"type !q!%COMSPEC%!q!|((for /L %%P in (1,1,78) do @pause>nul)&2>nul !q!%SystemRoot%\system32\findstr.exe!q! /B /C:!q!This program cannot be run in DOS mode.!q!)"'
) do set "CR=%%Z"

I know I know we already have a nice neat small fast understandable excellent method for grabbing CR

Code: Select all

for /F %%Z in ('copy /Z "%~dpf0" nul') do set "CR=%%Z"
They are not comparable in any way in terms of simplicity and speed. In addition to the long, ugly and cryptic logic of this new method for grabbing CR, it is much more slower (though not noticeable), because it involves creating of 4 concurrent processes (3 of cmd and 1 of findstr) to grab a single CR character

But for me and I'm sure for many of you, a big part of the reason for which we even put some of our efforts to program in batch is the fun and pleasure it gives by challenging ourselves to solve problems in a limited and at same time nostalgic environment, and it adds even more to the fun when you know CR is a very hard to catch character.
Aside from that, it is always good to have alternate methods at hand, even if we don't use them at all. For instance, very unlikely though, if Microsoft decides to change the output format of copy /Z or make it to not produce CR when the target file is nul then we have an alternate method which works from win2K and up. Not so bad.
EDIT (May/10/2018)
Another method to gain CR along with the explanation of potential failure for COPY /Z method is given by carlos in the next reply to this post

Important Note:
As you will see in the discussions in this thread, because of the presence of extended ASCII characters in the MZ header, the above codes can not be used to reliably capture the mentioned characters in all locales and code pages. specially on systems where the a multi-byte code page is active like 932 or 950 (Chinese, Korean, Japanese,...). You will see the workaround to overcome the limitation, but it is generally not worth the effort for obtaining single character like TAB.
But if it works on your system you can safely use them for your own local needs.

Continue reading to see alternate methods for obtaining all ASCII characters.
/EDIT


And for the sake of completeness this is the code which will grab all the 13 non-printable characters(except <NULL> of course) in the MZ header at once, and saves them to a file.
It should work the same on Win2K to Win10. (Personaly I've tested with 2K,XP, Win7 and Win10)

Code: Select all


:: This should produce the exact same output running from Win2k through Win10

@echo off
setlocal EnableExtensions

call :GetMZBits
:: Now we have access to ASCII_xx vars

setlocal EnableDelayedExpansion
    set "prompt="
    for %%i in (01 03 04 09 0D 0E 90 1F B4 B8 BA CD FF) do set "prompt=!prompt!!ASCII_%%i!"
    "%COMSPEC%" /d /k<nul>"PE_MZ_Header_Special_Chars.bin"
endlocal

echo,
echo The non-printable chars from Standard MZ header of PE images have been extracted and written to "PE_MZ_Header_Special_Chars.bin"
echo,
echo It should have the sequence:
echo     0x01,0x03,0x04,0x09,0x0D,0x0E,0x90,0x1F,0xB4,0xB8,0xBA,0xCD,0xFF
echo,
echo Check with your Hex viewer.
pause
exit /b

:GetMZBits
setlocal EnableDelayedExpansion
set "q=""
(set \n=^%===<no_white_spaces_allowed_before_or_after_me>===%
%============<no_white_spaces_allowed_before_or_after_me>===%
%============<no_white_spaces_allowed_before_or_after_me>===%)
set "@{DeferExecute}=(endlocal!\n!"

for /F "usebackq tokens=1-3 delims=, " %%1 in ('
     2,0,90,!\n!
     4,0,03,!\n!
     8,0,04,!\n!
    13,0,FF,!\n!
    64,4,0E1FBA,!\n!
    69,4,B409CD,!\n!
    73,2,B801
') do (
    set "Capture="
    for /F "delims=" %%Z in (
        '"(type !q!%COMSPEC%!q!|((for /L %%P in (1,1,%%1) do @pause>nul)&set /p !q!Capture=!q!&call echo,%%Capture%%))2>nul"'
    ) do set "Capture=%%Z"
    set "Index=%%3"
    for /L %%i in (0,2,%%2) do (
        set /a "j=%%i/2"
        for %%j in (!j!) do (
            set "@{DeferExecute}=!@{DeferExecute}!set ASCII_!Index:~%%i,2!=!Capture:~%%j,1!!\n!"
        )
    )
)
set "@{DeferExecute}=!@{DeferExecute}!for /F !q!tokens=1* delims=.!q! %%Y in ('type ^^!q!%COMSPEC%^^!q!^^|^^(^^(for /L %%P in ^^(1^^,1^^,78^^) do @pause^^>nul^^)^^&2^^>nul ^^!q!%SystemRoot%\system32\findstr.exe^^!q! /B /C:^^!q!This program cannot be run in DOS mode.^^!q!^^)') do set ASCII_0D=%%Z!\n!)"
%@{DeferExecute}%
exit /b
Now I'm thinking about attacking NLS files. the code page files which you can find all of the 255 byte ranges in them, and the good news is that they are also unchanged at least from the days of Win2k, well not all them but many of them.
For instance these files are all same from Win2k to Win10

Code: Select all

c_437.nls          SHA1: 244ca701ca85e1ad389519e7c6655e609f70f39c
c_1252.nls         SHA1: 355e2ada0b9ea4f2a844c2d236d1b48336881b22
c_936.nls          SHA1: 74f6157dbd1fe91acaf322a459019c1bb719604c
c_949.nls          SHA1: c64638cdf715f2c4ef5e29ba7b680fdc8e9bf736
c_950.nls          SHA1: 1cd3f1ccf03d2b2d00dd3a133a6bebaa0e1bdb89
c_1252.nls have all the characters from to 255 all in a row except the for the range 0x80-0x9F which some of them are present is random locations
and the missing range can be found in c_950.nls for example.
It should be easy to extract the bits from those files but I currently (maybe for one or to days) don't have time to delve more in to this. So this maybe the subject of more investigation if anyone have found interest in it.

EDIT (May/10/2018)
Preliminary demo code for obtaining complete ASCII table through NLS files was made available at Apr/27/2018
Improved code posted at May/10/2018

Other alternate methods are also available by carlos's genchr (actually from 2014 but discussed again in this thread, and is under review again for improvement) and by penpen which uses a very nice technique using code page transformation which does not have reliance on any external binary file format.

So continue reading.

Re: New Universal method to get TAB, Carriage return and possibly all others

Posted: 21 Apr 2018 17:23
by carlos
Very interesting post sst.

For the get the CR I not found this excellent:

Code: Select all

for /F %%Z in ('copy /Z "%~dpf0" nul') do set "CR=%%Z"
Because it fails if the batch script have the readonly attribute. More details here: viewtopic.php?f=3&t=4741&start=90#p40757

I prefer this instead:

Code: Select all

set "CR=" &For /F "skip=1" %%a in (
'"echo(|replace.exe ? . /u /w"'
) do if not defined CR set "CR=%%a"
I found that is very interesting the file %windir%\system32\c_1252.nls that you mentions, because have almost or all the characters, (I have not looked as closely).

The only problem is that in case of know how handle that file, what guarantees that it will not change in the next version of windows ?

Are you looked the genchr routine for generate binary data. It works on windows 2000? here viewtopic.php?t=5326#p32108

You post is very interesting. I liked learning about the nls files.

And the GetMZBits routine that you write is very good.
You can replace this portion for not care about spaces.

Code: Select all

(set \n=^%===<no_white_spaces_allowed_before_or_after_me>===%
%============<no_white_spaces_allowed_before_or_after_me>===%
%============<no_white_spaces_allowed_before_or_after_me>===%)
with this next, that have the only have the disadvantage of working with a temporary file.

Code: Select all

Set "tmplf=%Windir%\Temp\lf.tmp"
Echo(|(Pause >Nul &Findstr "^" &Set /P "=#" < Nul) > "!tmplf!"
Set /P "LF=" < "!tmplf!" & Set "LF=!LF:~0,1!"

Set "\n=!LF!"
Your code for set the TAB variable is excellent. I like it.

Re: New Universal method to get TAB, Carriage return and possibly all others

Posted: 23 Apr 2018 16:24
by carlos
"Check with your Hex viewer."

I found that set the correct codepage at the begin of the script is needed.
Casually, playing with the nls files I set my codepage to 950.

When I run GetMZBits again. It have corrupt data.
Also if you run the routine for set the TAB variable using codepage 950 the variable will have the letter T instead the tab character.

Setting the codepage to 1252 or 65001 at the begin of the script solve the problem.
I mentions this because the routine pretend be "universal" and currently is codepage dependant.

Re: New Universal method to get TAB, Carriage return and possibly all others

Posted: 24 Apr 2018 00:25
by sst
carlos wrote:
23 Apr 2018 16:24
"Check with your Hex viewer."

I found that set the correct codepage at the begin of the script is needed.
Casually, playing with the nls files I set my codepage to 950.

When I run GetMZBits again. It have corrupt data.
Also if you run the routine for set the TAB variable using codepage 950 the variable will have the letter T instead the tab character.

Setting the codepage to 1252 or 65001 at the begin of the script solve the problem.
I mentions this because the routine pretend be "universal" and currently is codepage dependant.
Thank you carlos for pointing that out. would you please check with this too

Code: Select all

((for /L %%P in (1,1,70) do pause>nul)&set /p "TAB=")<"%COMSPEC%"
set "TAB=%TAB:~0,1%"
I wasn't able to switch to code page 950 on Win7 and XP, It gives "Invalid code page" message. however on Win10 code page 950 activated by chcp.
The above worked for me on code page 950. But I appreciate your feedback.

BTW , by Universal, I meant it to be universal across different versions of windows for at-least what is released till today, and didn't consider the behavior on different code pages.
Anyway I admit that putting the term Universal on the title was a silly idea and I will remove that asap. I will also edit the first post to reflect the fact about different code pages and will replace or add alternate methods to cover different code pages if possible but without more feedback any new change, or replacement of the method may turn out to be wrong again.

There is also a wrong statement when I said
set /p stops reading from input when it encounters one of the <LF>, <CR> or <NULL> characters
For what I have found till now, when set /p is used in a pipe inside of the FOR /F loop, It doesn't stop on any character, not <NULL> not <CR> not <LF> and not <CRLF> and <LFCR>. It will read and remove the whole input buffer no matter what it contains, either till the end of it, or on the first 1023 bytes, which ever comes first. But the result of the read will be truncated at <NULL>, <CRLF> and <LFCR>
On the other hand with direct reading: (set /p "VAR=")<"FILE" set /p will stop reading from input if it encounters <CRLF> or <LFCR>. Otherwise the behavior is the same as above.
Of course I'm not talking about something new, I'm just trying clarify and correct my own mistake. this is just based on my own observation of the behavior which is not complete and conclusive and may be wrong on some parts and details.
I'm still investigating this but if someone could point me out to where can I found more detailed and complete information regarding the behavior of set /p on different situations, that would be great.
For what I have learned till now set /p can be used for fast forward seeking with 1023 byte chunks and then use individual pause>nuls for the reminder to reach the desired offset on larger files.

And for the NLS files I've managed to extract the whole range 01-FF from C_1256.NLS and C_950.NLS files and create in memory character table, but now I have to consider different code pages as well, so this will need more work.

OFF_TOPIC Clarification
On the first post when I created a section titled Comment For Begginers,
Put the misspelling of Beginner aside,
I was trying the post to be more descriptive, but in a way that will not be offensive to the experts, by describing the obvious
But then I realized that using the term Beginner may lead to the false impression that I'm declaring myself as expert or something on that levels, which is not the case.
I'm not even remotely qualified for such titles.
Hope that this clarify and prevent any potential misunderstanding in advance.
I should remove that term too.
There seems be a lot editing is needed for my post :oops:

Re: New Universal method to get TAB, Carriage return and possibly all others

Posted: 24 Apr 2018 02:24
by sst
I couldn't come up with solution to handle different code pages without changing the active code page.
I think either it is not possible or is something beyond me.
I think changing the active code page is not a good practice so the only solution that I could come up with is to save the current active code page, chcp to 1252, do the work and then restore it to what it was before.

Code: Select all

for /F "tokens=1-10" %%A in ('chcp') do (
    for /F %%Z in ("%%J %%I %%H %%G %%F %%E %%D %%C %%B %%A") do set "PreviousCodePage=%%Z"
)
>nul chcp 1252

:: Some code goes here

>nul chcp %PreviousCodePage%

It is based on the assumption that in any locale the last token should be code page number. I don't know if there exist a language that the sentence "Active Code Page" would be translated to something more than 9 words or use something other than spaces to delimit the words. So I assume that 10 tokens should cover all languages.

So this is the rewritten GetMZBits which incorporates this method. I've also changed the method of reading data from pipe to direct read.
Feedback is much appreciated. And again thank you carlos

Code: Select all


:: GetMZBits.cmd
:: corrected to handle different code pages correctly
:: This should produce the exact same output running from Win2k through Win10

@echo off
setlocal EnableExtensions

for /F "tokens=1-10" %%A in ('chcp') do (
    for /F %%Z in ("%%J %%I %%H %%G %%F %%E %%D %%C %%B %%A") do set "PreviousCodePage=%%Z"
)
>nul chcp 1252

call :GetMZBits
:: Now we have access to ASCII_xx vars


setlocal EnableDelayedExpansion
    set "prompt="
    for %%i in (01 03 04 09 0D 0E 90 1F B4 B8 BA CD FF) do set "prompt=!prompt!!ASCII_%%i!"
    "%COMSPEC%" /d /k<nul>"PE_MZ_Header_Special_Chars.bin"
endlocal

echo,
echo The non-printable chars from Standard MZ header of PE images have been extracted and written to "PE_MZ_Header_Special_Chars.bin"
echo,
echo It should have the sequence:
echo     0x01,0x03,0x04,0x09,0x0D,0x0E,0x90,0x1F,0xB4,0xB8,0xBA,0xCD,0xFF
echo,
echo Check with your Hex viewer.
pause

>nul chcp %PreviousCodePage%
exit /b

:GetMZBits
setlocal EnableDelayedExpansion
set "q=""
(set \n=^%===<no_white_spaces_allowed_before_or_after_me>===%
%============<no_white_spaces_allowed_before_or_after_me>===%
%============<no_white_spaces_allowed_before_or_after_me>===%)
set "@{DeferExecute}=(endlocal!\n!"

for /F "usebackq tokens=1-3 delims=, " %%1 in ('
     2,0,90,!\n!
     4,0,03,!\n!
     8,0,04,!\n!
    13,0,FF,!\n!
    64,4,0E1FBA,!\n!
    69,4,B409CD,!\n!
    73,2,B801
') do (
    ((for /L %%P in (1,1,%%1) do @pause>nul)&set /p "Capture=")<"%COMSPEC%"
    set "Index=%%3"
    for /L %%i in (0,2,%%2) do (
        set /a "j=%%i/2"
        for %%j in (!j!) do (
            set "@{DeferExecute}=!@{DeferExecute}!set ASCII_!Index:~%%i,2!=!Capture:~%%j,1!!\n!"
        )
    )
)
set "@{DeferExecute}=!@{DeferExecute}!for /F !q!tokens=1* delims=.!q! %%Y in ('type ^^!q!%COMSPEC%^^!q!^^|^^(^^(for /L %%P in ^^(1^^,1^^,78^^) do @pause^^>nul^^)^^&2^^>nul ^^!q!%SystemRoot%\system32\findstr.exe^^!q! /B /C:^^!q!This program cannot be run in DOS mode.^^!q!^^)') do set ASCII_0D=%%Z!\n!)"
%@{DeferExecute}%
exit /b

Re: Alternate method to get TAB, Carriage return and possibly all others

Posted: 26 Apr 2018 14:01
by sst
carlos wrote:
21 Apr 2018 17:23
I found that is very interesting the file %windir%\system32\c_1252.nls that you mentions, because have almost or all the characters, (I have not looked as closely).

The only problem is that in case of know how handle that file, what guarantees that it will not change in the next version of windows ?
Nothing is guaranteed, but assuming the availability of cerutil.exe in future versions of windows, checking the integrity of those files is a trivial task. in case of mismatch, the script can fallback to makecab method.
carlos wrote:
21 Apr 2018 17:23
Are you looked the genchr routine for generate binary data. It works on windows 2000? here viewtopic.php?t=5326#p32108
Yes I can confirm that makecab method works in Windows 2000.
They are not exactly the same, as with NLS method, the character table can be directly created in memory without the need to create on disk files.
But it is not as reliable as makecab method because we can't be sure what kind changes will be happened in future versions of windows.
carlos wrote:
21 Apr 2018 17:23
Your code for set the TAB variable is excellent. I like it.
Thank you carlos. you may try this instead.

Code: Select all

((for /L %%P in (1,1,70) do pause>nul)&set /p "TAB=")<"%COMSPEC%"
set "TAB=%TAB:~0,1%"
Anyway I've finished working on NLS files and will post the sample code on next post.
I just did it for fun, but as it passed my tests on different versions of windows, I myself have more confidence of using it for more serious work if I ever need to quickly create an in-memory character map.

Re: Alternate method to get TAB, Carriage return and possibly all others

Posted: 26 Apr 2018 15:14
by sst
So Here it is NLSAscii.cmd
The sample code to generate complete ASCII table(Except NULL) from C_1252.NLS and C_950.NLS files.

Code: Select all


:: NLSAscii.cmd

:: NLS ASCII TABLE Generator. By sst
:: Sample batch script to demonstrate creation of in-memory ASCII table using NLS files.
:: Dependency:
:: C_950.NLS  SHA1: 1cd3f1ccf03d2b2d00dd3a133a6bebaa0e1bdb89
:: C_1252.NLS SHA1: 355e2ada0b9ea4f2a844c2d236d1b48336881b22

:: This script should produce the correct table on Win2K, WinXP, WinVista, Win7, Win8.x, Win10

:: Tested on Win2K SP4, WinXP SP3 32bit, Win7 SP1 64bit, Win10 1709 64bit


@echo off
setlocal EnableExtensions DisableDelayedExpansion

:: Set this to none zero to additionally create one-byte chr files.
set /a "CreateChrFiles=0"

:: Taking into account the possibility for different number of tokens in the ouput of "chcp" on localized versions of Windows.
:: 10 is overkill but surely safe.
for /F "tokens=1-10" %%A in ('chcp') do (
    for /F %%Z in ("%%J %%I %%H %%G %%F %%E %%D %%C %%B %%A") do set "PreviousCodePage=%%Z"
)
>nul chcp 437

call :CreateAsciiTableFromNLS ASCII_TABLE Msg
if errorlevel 1 ((echo,%Msg%)>&2 & exit /b %errorlevel%)


setlocal EnableDelayedExpansion

set "prompt=!ASCII_TABLE:$=$$!"
"%COMSPEC%" /a /d /k<nul>"ASCII_TABLE.dat"
set "prompt="

echo,
echo Ascii table has been created and loaded in to ASCII_TABLE environment variable.
echo The contents of ASCII_TABLE variable have been saved to the file "ASCII_TABLE.dat" in the current directory.
echo SHA1 hash value of the file should be: 8760d3807fb0c8ce8fd426f4cf72632edf8789f6
echo,

set "ASCII_TABLE=#!ASCII_TABLE!"
set "LF=!ASCII_TABLE:~10,1!"
set "CR=!ASCII_TABLE:~13,1!"
set "SUB=!ASCII_TABLE:~26,1!"
set "ChrSubDir=.\CHR_NLS"
if /i %CreateChrFiles% NEQ 0 (
    echo Press any key to create 256 one-byte chr files in "%ChrSubDir%" folder...
    pause>nul
    <nul set /p "=Please Wait...!CR!"
    if not exist "%ChrSubDir%\" MD "%ChrSubDir%"
    for /L %%i in (1,1,25) do (
       (echo !ASCII_TABLE:~%%i,1!%SUB%)>"%ChrSubDir%\%%i.tmp"
       copy /y "%ChrSubDir%\%%i.tmp" /a "%ChrSubDir%\%%i.chr" /b >nul
    )
    for /L %%i in (27,1,255) do (
       (echo !ASCII_TABLE:~%%i,1!%SUB%)>"%ChrSubDir%\%%i.tmp"
       copy /y "%ChrSubDir%\%%i.tmp" /a "%ChrSubDir%\%%i.chr" /b >nul
    )
    del /f /q "%ChrSubDir%\*.tmp" >nul 2>&1
    set "prompt=%SUB%"
    "%COMSPEC%" /a /d /k<nul>"%ChrSubDir%\26.chr"
    "%COMSPEC%" /u /d /k<nul>"%ChrSubDir%\0.tmp"
    "%COMSPEC%" /a /d /k<nul>>"%ChrSubDir%\0.tmp"
    type "%ChrSubDir%\0.tmp"|(pause>nul&findstr "^")>"%ChrSubDir%\0.tmp"
    copy /y "%ChrSubDir%\0.tmp" /a "%ChrSubDir%\0.chr" /b >nul
    del /f "%ChrSubDir%\0.tmp" >nul 2>&1
    echo Done.          !LF!
)
endlocal

>nul chcp %PreviousCodePage%

pause
exit /b 0



:CreateAsciiTableFromNLS <outResult> [outErrorMessage]
setlocal EnableDelayedExpansion

:: If the latest version of Dave Benham's RETURN.BAT is present in system and is referenced by PATH variable, set this to none zero.
:: https://www.dostips.com/forum/viewtopic.php?f=3&t=6496&p=41929#p41929
set /a "USE.RETURN.BAT=0"


set "NLS950=%SystemRoot%\system32\C_950.NLS"
set "NLS1252=%SystemRoot%\system32\C_1252.NLS"

if "%~1"=="" exit /b 2
if not exist "%NLS950%"  (endlocal & 2>nul set "%~2=%0: "%NLS950%" does not exist." & exit /b 1)
if not exist "%NLS1252%" (endlocal & 2>nul set "%~2=%0: "%NLS1252%" does not exist." & exit /b 1)



:: Detection for Win2K is just for speed of reading file content
:: On Win2K, "SET /P" will read 1024 bytes of input buffer
:: On WinXP and later "SET /P" will read 1023 bytes of input buffer.
:: We need to know this to be able to seek to the correct offset.
:: Otherwise we have to use exsclusively "pause>nul" to seek the input file
:: which is much slower for larger offsets
:: PS Note:
::   There is a difference between using SET /P inside a pipe eg: TYPE "BinaryFile" | SET /P "CAPTURE="
::   and using SET /P to directly read file contents eg: (SET /P "CAPTURE=")<"BinaryFile"
::   In pipe mode "SET /P" will always eat the first 1023 bytes of input (1024 bytes on Win2K), no matter what the content is
::   and the result will be truncated at <NULL>, <CRLF> and <LFCR>
::   In direct read mode, SET /P will eat the whole 1023 or 1024 bytes ONLY if doesn't encounter one of the <CRLF> or <LFCR> pairs in the way,
::   in that case, SET /P will immediately stop reading from input upon reaching <CRLF> or <LFCR>
::   In both modes, all of the control characters(range 00-1F) which came immediately before <CRLF> or <LFCR> will be removed from the target variable.
::   Here, using the direct read mode to fast seek the NLS files is OK becuase there is no <CRLF> or <LFCR> in them.
::   But for the perpose of FastSeeking arbitrary files, only pipe mode can be used.

for /F "tokens=3" %%V in ('ver') do (
    if "%%V"=="2000" (set "FastSeekBytes=1024") else (set "FastSeekBytes=1023")
)

set "@{FastSeek}=SET /P "=""

:: Win2K's cmd has a bug in which it expands variable values to zero in SET /A expressions,
:: if they are placed in the right side of the operands. Thus we have to expand them ourselves.
set /a "NLS950_SingleSeekBytes1=1782-%FastSeekBytes%"
set /a "NLS950_SingleSeekBytes2=1890-%FastSeekBytes%"
set /a "NLS1252_SingleSeekBytes=547"

(
    for /L %%P in (1,1,%NLS1252_SingleSeekBytes%) do pause>nul
    set /p "CAPTURE="
)<"%NLS1252%"
set "CHAR_01-7F=!CAPTURE:~0,127!"
set "CHAR_A0-FF=%CAPTURE:~159,96%"


set "CHAR_EF=!CHAR_A0-FF:~79,1!"
(
    %@{FastSeek}%
    for /L %%P in (1,1,%NLS950_SingleSeekBytes1%) do pause>nul
    REM Equivalent to: for /L %%P in (1,1,1782) do pause>nul

    set /p "CAPTURE="
)<"%NLS950%"
set "CHAR_80-93=!CAPTURE:%CHAR_EF%=!
set "CHAR_80-93=!CHAR_80-93:~0,20!

(
    %@{FastSeek}%
    for /L %%P in (1,1,%NLS950_SingleSeekBytes2%) do pause>nul
    REM Equivalent to: for /L %%P in (1,1,1890) do pause>nul

    set /p "CAPTURE="
)<"%NLS950%"
set "CHAR_94-9F=!CAPTURE:%CHAR_EF%=!
set "CHAR_94-9F=!CHAR_94-9F:~0,12!
set "CAPTURE="

set "Result=!CHAR_01-7F!!CHAR_80-93!!CHAR_94-9F!!CHAR_A0-FF!"

if /i %USE.RETURN.BAT% NEQ 0 (
    2>nul call RETURN.BAT Result %~1 0 || (endlocal & 2>nul set "%~2=%0: RETURN.BAT not found. set USE.RETURN.BAT to 0 then try again." & exit /b 1)
) else (
    set "tmpTable=."
    if defined Temp if exist "%Temp%\" set "tmpTable=%Temp%"
    set "tmpTable=!tmpTable!\_nls_ascii_table_%RANDOM%.tmp"
)
if "%USE.RETURN.BAT%" NEQ "0" (endlocal & 2>nul set "%~2=%0: RETURN.BAT mulfunctioned or wrong RETURN.BAT called. set USE.RETURN.BAT to 0 then try again." & exit /b 1)

(
    (echo !Result!)>"%tmpTable%"
    endlocal
    (set /p "%~1=")<"%tmpTable%"
    del /f "%tmpTable%" >nul 2>&1
    exit /b 0
)


Re: Alternate method to get TAB, Carriage return and possibly all others

Posted: 26 Apr 2018 17:58
by carlos
@sst I not get to work correctly the new NLSAscii.cmd using codepage 950.

Also, this not work using codepage 950:

Code: Select all

((for /L %%P in (1,1,70) do pause>nul)&set /p "TAB=")<"%COMSPEC%"
set "TAB=%TAB:~0,1%"
The difference between using the input redirection and use the Type command (both uses the DPATH environment for found the file) is that Type command internally convert the content of the file from multibyte using the current codepage to unicode.
I think both will use the current codepage for interpret it, but type will convert to unicode before interpret.
Thus, is always needed use the proper input codepage. Chcp alter input codepage and output codepage (at the same time, but are two different things).

I think that maybe the problem is because that is not the same filter a unicode character with the pause command that filter a codepoint of some codepage using the pause command, plus the trim that do set /p from cmd version since windows 7.

For save the current codepage you can use this:

Code: Select all

For /f "tokens=2 delims=:" %%G in ('CHCP') do Set _codepage=%%G
I remember the : delimiter is for support the ouput of the chcp command on german windows.

Re: Alternate method to get TAB, Carriage return and possibly all others

Posted: 26 Apr 2018 22:47
by sst
carlos wrote:
26 Apr 2018 17:58
@sst I not get to work correctly the new NLSAscii.cmd using codepage 950.
No it doesn't, I used chcp 437 for that very same reason, I think you changed that line to chcp 950 yourself. Otherwise it should work.

I have no experience with handling of different code pages, so I dont know if it can be done without the help of chcp or not.

And for the TAB character this worked for me on windows 10 with code page 950

Code: Select all

:: TabTest950.cmd
@echo off

type "%~f0"

setlocal
ver
chcp 950
((for /L %%a in (1,1,70) do pause>nul)&set /p "TAB=")<"%COMSPEC%"
set "TAB=%TAB:~0,1%"

set "TabTest=TAB--^>[%TAB%]^<--TAB  SPACE--^>[ ]^<--SPACE  EMPTY--^>[]^<--EMPTY

echo,
echo TabTest echo %TabTest%
(echo TabTest file %TabTest%)>tab950.txt
type tab950.txt
:: End of Script.

:: Result goes here

OUTPUT (ScreenShot)
Image
plus the trim that do set /p from cmd version since windows 7.
The trim is applied to the prompt part of SET /P not the variable.
I remember the : delimiter is for support the ouput of the chcp command on german windows.
It is the same for English windows. But I had no idea what does the output of chcp may look like on say Chinese or Korean or... locales. Do they also use : for delimiter or not. Thats the cause I used that weird form to save chcp. It seems that I'm complicating things unnecessarily.

Re: Alternate method to get TAB, Carriage return and possibly all others

Posted: 27 Apr 2018 07:53
by carlos
@sst Your code works correctly. But I found the cause why it not works on my pc, even with using chcp.

This was the output:
Image

The cause of the problem was a registry key that I export to this:

cp_950.reg

Code: Select all

Windows Registry Editor Version 5.00

[HKEY_CURRENT_USER\Console\%SystemRoot%_system32_cmd.exe]
"CodePage"=dword:000003b6

Even, when I change the codepage for example to 850 or 1252. And I close the cmd windows, when I open a new cmd instance, it starts with the 950 codepage. I not know why this affect the input redirection. Even with change the codepage to other than 950, that registry key was the cause of the problem.
I remove that registry key and run the code in a new cmd instance and all was fine, like this:

Image

The new question now is how avoid posssible problems in the script when that reigstry keys is present. I' m investigating that.

Re: Alternate method to get TAB, Carriage return and possibly all others

Posted: 27 Apr 2018 15:57
by sst
carlos wrote:
27 Apr 2018 07:53
The new question now is how avoid posssible problems in the script when that reigstry keys is present. I' m investigating that.
That is interesting. So I took time to find out what is exactly going on.
I'm afraid there is nothing we can do about it.
Contrary to what it seems, your approach of setting the code page through registry, is the correct way of doing that, and it yields the true results for running batch code under different code pages.
Setting code page for cmd through registry, has the same effect as changing the system locale by Language and Regional applet of control panel.

It turned out that chcp only affects the console window and the operations that are related to it, like redirection to/from file, but setting code page by registry;or changing system locale globally by control panel, has deeper effects on more low level functions of the cmd. It affects all read/write operations that are performed in text mode, including copy /a

So when for example, a Chinese guy sits in front of his computer and opens cmd console windows, he will not manually switch to code page 932 or 950 or the like, by typing chcp [MyChineseCodePage], It is the system locale that enforces the active code page, so he gets the wrong results. and even chcp to other code pages can't help him either, unless he change his system locale or, change the cmd's code page through registry, and sets it to a single byte code page. Thus chcp is completely useless to test code behavior under different code pages.

The fact that chcp can't be used to alter the underlying code page that is in use at process level, is disappointing. This means that working with extended ASCII characters in batch scripts under multi-byte locales is doomed to failure.
On the other hand, working under single byte code pages is not problematic as long as we read and write with the same code page.


So even the makecab genchr is not universal in this context.
Just change your system locale to Chinese Traditional or set cmd's code page in registry to 932 or 950 then run makecab genchr and see what happens to the extended character range of the generated chr files.
Image

This is what happens with makecab genchr: (used debenam's code as test case: viewtopic.php?f=3&t=5326&start=60#p32394)
Image

Re: Alternate method to get TAB, Carriage return and possibly all others

Posted: 27 Apr 2018 17:29
by carlos
Yes, I check that it affect the copy command in text mode (ignoring sometimes the 1a characters as end of file). I think that maybe the character before 1a character can be interpreted as a special codepoint in some multibyte codepage, thus turning the next 1a character loss the meaning of end of file in text mode.
This should be the reason why some chr files are not "cut" by copy /a. Are not all the chr file, only some.

It will very useful found a solution.

Re: Alternate method to get TAB, Carriage return and possibly all others

Posted: 29 Apr 2018 02:22
by jfl
the contents of every 32 and 64 bit windows PE Image(Executables, DLLs, OCX,...) starts with a tiny DOS stub header known as MZ header which have enough logic to display the message "This program cannot be run in DOS mode."
Every 32 and 64 bit PE image does indeed start with a DOS stub, but the DOS stub that you describe is not universal. It's just the default DOS stub provided with Microsoft linker.
I usually build my system management programs for DOS and Windows, and use the DOS version of the program as the DOS stub for the WIN32 version.
This way the same .exe contains both the 16 and 32 bits versions of its program, and it'll work in _ALL_ versions of DOS and Windows, from 16-bits DOS 3 to 64-bits Windows 10.
Try the programs there if you don't believe this is possible :-)

Admittedly, very few people probably use their own stubs like I do.
But all programs built with non-Microsoft tool chains (For example MinGW) probably have their own distinct stub.

Re: Alternate method to get TAB, Carriage return and possibly all others

Posted: 29 Apr 2018 09:41
by sst
jfl wrote:
29 Apr 2018 02:22
Try the programs there if you don't believe this is possible :-)

Admittedly, very few people probably use their own stubs like I do.
But all programs built with non-Microsoft tool chains (For example MinGW) probably have their own distinct stub.
I believe you, as for making a fully functional hybrid DOS/Windows executable, I did that 18 years ago is the days of Windows 98.

I didn't claim that the DOS stub is universal or cannot be changed, what I suggested was that Microsoft have not decided to change that default DOS stub that is produced by it's linker, and may have no reason to do that in the future, that stub is already small enough and it may not worth the risk of breaking compatibility with the utilities that may have incorrectly dependency on that particular stub by just saving a few bytes.
Of course these are only speculations.

Batch is a different story, considering the limitations, one may resort to every thing he/she can to obtain a single character that is easily available in other languages/scripts.

Re: Alternate method to get TAB, Carriage return and possibly all others

Posted: 30 Apr 2018 14:09
by sst
carlos wrote:
27 Apr 2018 17:29
It will very useful found a solution.
@carlos
I've found that, after issuing chcp command, initiating a new cmd instance on the same console would resolve the code page issue for that new cmd instance.
So for that particular version of genchr that I've used as test case in the previous post, putting chcp 437 in the start of the script would create correct chr files, because it offloads all of it's work to new instances of cmd on the same console. Otherwise the script needs to relaunch itself in a new instance of cmd after changing of the active code page.