how to pass/read unicode char on command line/batch script
Moderator: DosItHelp
how to pass/read unicode char on command line/batch script
please help me
I have one batch script file which is just reading one properties file which have some special unicode char
how to read those char as correct version so that i can use for other opertaion.
Thnaks
vJoy
I have one batch script file which is just reading one properties file which have some special unicode char
how to read those char as correct version so that i can use for other opertaion.
Thnaks
vJoy
Re: how to pass/read unicode char on command line/batch scri
Code: Select all
help cmd|find "/U"
Re: how to pass/read unicode char on command line/batch scri
Hi,
Is it possible scenario like if OS locale is English then can we pass Japanese char on command line if aal the lang font are installed on system
If yes then please let me know how we can pass the correct value via cmd.
Please help
Thanks in Advance
VJoy
Is it possible scenario like if OS locale is English then can we pass Japanese char on command line if aal the lang font are installed on system
If yes then please let me know how we can pass the correct value via cmd.
Please help
Thanks in Advance
VJoy
-
- Expert
- Posts: 442
- Joined: 01 Aug 2010 17:13
- Location: Canadian Pacific
- Contact:
Re: how to pass/read unicode char on command line/batch scri
I did extensive research into unicode in relation to Command Prompt for my own purposes recently.
It can receive unicode arguments from explorer.exe/Windows OS.
It can write unicode to a text file in UTF-16LE format. Use "cmd /u" and redirect ">" / ">>". But...
It can't read any unicode file properly, even ones it encoded itself in UTF-16LE. Some latin-based languages might display ok, but extended unicode like Asian languages is impossible.
Command Prompt can work with unicode characters, representing them with an odd symbol, and it knows that two unicode characters are different. You can even retrieve unicode characters from folder/file paths using "for". (In fact I built an intuitive best-case-scenario script for retrieving the unicode from wildcard-containing paths without error.) ...but that's it.
It can only display the characters provided through the ANSI codepage. Yet this has no effect on it's ability to handle unicode. It's really only mean for the basic Roman alphabet and command characters for dealing with file and folder paths, not text.
That is to say, changing locale or anything on the OS will have no effect. As I explained, Command Prompt relies on ANSI codepages.
So what else can be done? I tried to get people who can program in C/C++ to write a tool that works like "type" but processes unicode as the Windows OS does when passing an argument, but failed due to a general lack of knowledge. This would at least allow users to work with unicode characters stored in text files.
I believe it's possible to fix this since Windows can pass unicode and users can paste unicode onto the command line, but it's still beyond my capabilities. I hope this helps you understand the problem.
It can receive unicode arguments from explorer.exe/Windows OS.
It can write unicode to a text file in UTF-16LE format. Use "cmd /u" and redirect ">" / ">>". But...
It can't read any unicode file properly, even ones it encoded itself in UTF-16LE. Some latin-based languages might display ok, but extended unicode like Asian languages is impossible.
Command Prompt can work with unicode characters, representing them with an odd symbol, and it knows that two unicode characters are different. You can even retrieve unicode characters from folder/file paths using "for". (In fact I built an intuitive best-case-scenario script for retrieving the unicode from wildcard-containing paths without error.) ...but that's it.
It can only display the characters provided through the ANSI codepage. Yet this has no effect on it's ability to handle unicode. It's really only mean for the basic Roman alphabet and command characters for dealing with file and folder paths, not text.
Code: Select all
chcp /?
That is to say, changing locale or anything on the OS will have no effect. As I explained, Command Prompt relies on ANSI codepages.
So what else can be done? I tried to get people who can program in C/C++ to write a tool that works like "type" but processes unicode as the Windows OS does when passing an argument, but failed due to a general lack of knowledge. This would at least allow users to work with unicode characters stored in text files.
I believe it's possible to fix this since Windows can pass unicode and users can paste unicode onto the command line, but it's still beyond my capabilities. I hope this helps you understand the problem.
Re: how to pass/read unicode char on command line/batch scri
orange_batch wrote:I did extensive research into unicode in relation to Command Prompt for my own purposes recently.
It can receive unicode arguments from explorer.exe/Windows OS.
It can write unicode to a text file in UTF-16LE format. Use "cmd /u" and redirect ">" / ">>". But...
It can't read any unicode file properly, even ones it encoded itself in UTF-16LE. Some latin-based languages might display ok, but extended unicode like Asian languages is impossible.
It can write from OEM string to Unicode file in UTF8 format like this:
(866 - default code page for my Ruissian local, BOM may be created via Set/P command):
Code: Select all
set LINE=Some localized OEM text
set FILE=test.txt
CHCP 65001|>>%FILE% Echo %LINE%&CHCP 866
TYPE command may use for read UTF16/UTF8 files with BOM for convert to OEM/ANSI
-
- Expert
- Posts: 442
- Joined: 01 Aug 2010 17:13
- Location: Canadian Pacific
- Contact:
Re: how to pass/read unicode char on command line/batch scri
Ah, so it will also output UTF-8.
Example please?
amel27 wrote:TYPE command may use for read UTF16/UTF8 files with BOM for convert to OEM/ANSI
Example please?
Re: how to pass/read unicode char on command line/batch scri
orange_batch wrote:amel27 wrote:TYPE command may use for read UTF16/UTF8 files with BOM for convert to OEM/ANSI
Example please?
of course mistake for UTF8, but work for UTF16 with BOM:
Code: Select all
(for /f "delims=" %%a in ('type utf16.txt') do echo.%%a
)>oem.txt
-
- Expert
- Posts: 442
- Joined: 01 Aug 2010 17:13
- Location: Canadian Pacific
- Contact:
Re: how to pass/read unicode char on command line/batch scri
*scratches head*
So Command Prompt can write UTF-8 files, and type can read UTF-8 files when the codepage is 65001 (chcp 65001). I'm still unable to set the unicode output of type though... Is there any way?
So Command Prompt can write UTF-8 files, and type can read UTF-8 files when the codepage is 65001 (chcp 65001). I'm still unable to set the unicode output of type though... Is there any way?
Re: how to pass/read unicode char on command line/batch scri
Code: Select all
cmd /u /c type oem.txt > utf.txt
-
- Expert
- Posts: 442
- Joined: 01 Aug 2010 17:13
- Location: Canadian Pacific
- Contact:
Re: how to pass/read unicode char on command line/batch scri
That just converts certain OEM to UTF-8...
If you want to see what I mean, try experimenting with this Japanese character: き
Under any and all circumstances (codepages, unicode mode, whatever), paste it into Command Prompt, echo it to a text file, type it back, try to set it to a variable with:
As I just discovered, you can write and read properly a UTF-8 file under codepage 65001, but as usual for is unable to see the unicode from type. Strangely, you can paste unicode into for fine though. Redirection doesn't work either... type must do some kind of look-up when doing unicode that for doesn't wait for.
If you want to see what I mean, try experimenting with this Japanese character: き
Under any and all circumstances (codepages, unicode mode, whatever), paste it into Command Prompt, echo it to a text file, type it back, try to set it to a variable with:
Code: Select all
for /f "delims=" %x in ('type utf.txt') do @set myvar=%x
As I just discovered, you can write and read properly a UTF-8 file under codepage 65001, but as usual for is unable to see the unicode from type. Strangely, you can paste unicode into for fine though. Redirection doesn't work either... type must do some kind of look-up when doing unicode that for doesn't wait for.
Re: how to pass/read unicode char on command line/batch scri
orange_batch wrote:That just converts certain OEM to UTF-8...
hmm... interesting, for me it convert OEM to UTF16, but without BOM (Byte order mark), BOM can be write to file before text output
I think, before experimenting destination localization must be set as default in OS, and this char must be present in OEM charset for destination languageorange_batch wrote:If you want to see what I mean, try experimenting with this Japanese character
-
- Expert
- Posts: 442
- Joined: 01 Aug 2010 17:13
- Location: Canadian Pacific
- Contact:
Re: how to pass/read unicode char on command line/batch scri
Er sorry, whatever UTF it converted to. It's irrelevant to the main problem. But now I'm convinced I'm right about that problem.
Re: how to pass/read unicode char on command line/batch scri
Hi,
I see several questions.
1. display unicode
2. Working with a fix batch-text inside your batch
3. Working with unicode in a for-loop
4. redirecting unicode to another (unicode)text file
5. comparing characters/internal representation
1. In my opinion I can display unicode files with type independent of cmd /u or cmd /a, and codepage is also irrelevant. Only UCS_16LittleEndian files seems to work.
But the font (of the cmd-window) is important, set it to Lucida Console and you get more characters, but not all.
It's simply because they are missing in Lucida Console.
Till today, I'm not able to activate another font for my cmd-window (I tried "Arial Unicode MS" and "Courier New" in the registry)
2. Not tested yet
3. Works for me with a Unicode Little Endian file, but it's neccessary to set the right codepage before the FOR starts, and it only works with type not direct with a file.
Works with cmd /a but not with cmd /u (creates a file without BOM, perhaps UTF32 format, it is much longer than the other file)
The parenthesis have to be before the codepage changed to 65001, else my batch stops immediatly after chcp 65001.
Only the redirected output worked, a simple echo display garbage.
4. Redirecting seems to work, see 3. Only UTF-8 without BOM files are successful created. You can prefix with your own BOM file. But my text editor doesn't make a difference.
5. The unicode characters seems to be represented as multiple byte not as a single character.
So a string with "AﮓBﮧC" have a len of 9, because the both unicode chars are represented by 3 bytes (no, i can't read it)
my testfile
waiting for more informations
jeb
orange_batch wrote:As I just discovered, you can write and read properly a UTF-8 file under codepage 65001, but as usual for is unable to see the unicode from type. Strangely, you can paste unicode into for fine though. Redirection doesn't work either... type must do some kind of look-up when doing unicode that for doesn't wait for.
I see several questions.
1. display unicode
2. Working with a fix batch-text inside your batch
3. Working with unicode in a for-loop
4. redirecting unicode to another (unicode)text file
5. comparing characters/internal representation
1. In my opinion I can display unicode files with type independent of cmd /u or cmd /a, and codepage is also irrelevant. Only UCS_16LittleEndian files seems to work.
But the font (of the cmd-window) is important, set it to Lucida Console and you get more characters, but not all.
It's simply because they are missing in Lucida Console.
Till today, I'm not able to activate another font for my cmd-window (I tried "Arial Unicode MS" and "Courier New" in the registry)
2. Not tested yet
3. Works for me with a Unicode Little Endian file, but it's neccessary to set the right codepage before the FOR starts, and it only works with type not direct with a file.
Works with cmd /a but not with cmd /u (creates a file without BOM, perhaps UTF32 format, it is much longer than the other file)
Code: Select all
(
del u16_65001.txt 2> nul
chcp 65001 > nul
rem Not neccessay, but doesn't destroy anything
copy bom_utf8.txt u16_65001.txt
for /F "usebackq delims=" %%a in (`type unicode_L16.txt`) do (
echo %%a >> u16_65001.txt
)
chcp 1252 > nul
)
The parenthesis have to be before the codepage changed to 65001, else my batch stops immediatly after chcp 65001.
Only the redirected output worked, a simple echo display garbage.
4. Redirecting seems to work, see 3. Only UTF-8 without BOM files are successful created. You can prefix with your own BOM file. But my text editor doesn't make a difference.
5. The unicode characters seems to be represented as multiple byte not as a single character.
So a string with "AﮓBﮧC" have a len of 9, because the both unicode chars are represented by 3 bytes (no, i can't read it)
my testfile
Code: Select all
СРРР
Њ-Ћ-
абвгд
ежзийк-
ﭱﭲﭳ
ﭴﭵﭶﭷ
ﭸﭹﭺ
ﭻﭼﭽ
ﭾﭿﮓ
AﮧBﮦCﯛ
ﯚﱞﱟﮰ-
waiting for more informations
jeb