shokarta wrote: ↑31 Aug 2020 00:59
1) the "U_" always at the beginning stands for what?
It has no special purpose - my first name starts with an U and that way i could list both variables at the same time using the command "set U_" (without doublequotes)
Also both labels do contain data with different semantics - which probably is very confusing (sorry for that):
U_0005 contains the unicode character U+0005 and should be the character that under codepage 850 is mapped to byte 0x05 (*).
U_4021_009A contains two unicode character, which UCS-2 representation is the byte sequence 40 21 00 9A, so that are U+2140 and U+9A00.
(*) A look at codepage 850 reveals, that i used the wrong value there... - the correct character i should have used instead is U+2663, see:
https://de.wikipedia.org/wiki/Codepage_850.
Luckily it works, which i suspect depends on a faulty behaviour of cmd.exe - so that might stop working at any time when MS fix that.
You might lookup unicode characters here:
https://www.compart.com/de/unicode/U+2663
shokarta wrote: ↑31 Aug 2020 00:59
2) as im writing bytes (5,64,33,0,154), from your lines i see you are acually passing only 3 bytes:
No. That are three UCS-2 characters and each of them is 2 bytes long.
The command "cmd /a /c..." starts a new cmd.exe instance which outputs characters in ANSI encoding.
The command "cmd /u /c..." starts a new cmd.exe instance which outputs characters in UCS-2 encoding.
So the first "cmd/a" i used outputs the byte 0x05 (by accident
), while the other command sends 2 UCS-2 characters to stdout (which was redirected to COM3).
shokarta wrote: ↑31 Aug 2020 00:59
at the end, this probably would be just matter of me not understanding the line by line, so if I can kindly ask you to tear it down so I can understand?
The idea is to use UTF-7 replacement base64 encoding to load Unicode characters into environment variables.
Then single characters should be mapped by codepage 850 to the byte that should be send to COM3 (using cmd/a).
Double-byte-characters are send to COM3 using UCS-2 encoding.
To get to the base 64 strings you need, you just have to reverse the process.
Note that this method can't use the NULL character U+0000.
If you are using varying byte-sequences, then in some instances you can avoid that by splitting the string in the right places, in others it is impossible:
- possible instance sample (hex) --> splited into: 01 02 00 00 03 04 --> 01, 02 00, 00 03, 04
- impossible instance sample (hex): 00 00 01 02
You should always be aware of that issue. So i would still make use of the file "0.chr".
Your example could be processed like that:
Code: Select all
@echo off
setlocal enableExtensions enableDelayedExpansion
for /f "tokens=*" %%a in ('chcp') do for %%b in (%%a) do set "cp=%%~nb"
:: define UTF16-LE characters using UTF-7
:: result byte sequence (decimal) : 5, 97, 16, 0, 138, 5, 97, 17, 0, 137, 5, 97, 18, 0, 136
:: result byte sequence (hex) : 05 61 10 00 8A 05 61 11 00 89 05 61 12 00 88
:: possible split into code values: 05, 6110 008A 0561 1100 8905 6112 0088
:: apply ANSI and UCS_2 encoding : U+2663, U+1061 U+8A00 U+6105 U+0011 U+0589 U+1261 U+8800
:: nibble sequence : 2 6 6 3 ; 1 0 6 1 8 A 0 0 6 1 0 5 0 0 1 1 0 5 8 9 1 2 6 1 8 8 0 0
:: bit sequence (+padding: _) : 0010 0110 0110 0011__; 0001 0000 0110 0001 1000 1010 0000 0000 0110 0001 0000 0101 0000 0000 0001 0001 0000 0101 1000 1001 0001 0010 0110 0001 1000 1000 0000 0000__
:: base 64 bit sequence : 001001 100110 001100; 000100 000110 000110 001010 000000 000110 000100 000101 000000 000001 000100 000101 100010 010001 001001 100001 100010 000000 000000
:: base 64 encoding : J m M; E G G K A G E F A B E F i R J h i A A
:: base64 string: +JmM- +EGGKAGEFABEFiRJhiAA-
>nul chcp 65000
set "ANSI_1=+JmM-"
set "UCS2_1=+EGGKAGEFABEFiRJhiAA-"
>nul chcp 850
(
cmd /a /c"<nul set /p "=!ANSI_1!""
cmd /u /c"<nul set /p "=!UCS2_1!""
) >> COM3
>nul chcp %cp%
goto :eof
penpen