Bug cmd windows 10 encoding utf-8 input redirection

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
carlos
Expert
Posts: 503
Joined: 20 Aug 2010 13:57
Location: Chile
Contact:

Bug cmd windows 10 encoding utf-8 input redirection

#1 Post by carlos » 26 Apr 2018 18:22

Hello, I found that internally cmd.exe interpret the content of a input redirection improperly.

I have this file encoded as utf-8:

Code: Select all

ñandú
If i run this command:

Code: Select all

chcp 65001
(set /p "content=") < in.txt
echo %content%
it output:

Code: Select all

nandu
It convert improperly the ñ to n and the ú to u.

Squashman
Expert
Posts: 4486
Joined: 23 Dec 2011 13:59

Re: Bug cmd windows 10 encoding utf-8 input redirection

#2 Post by Squashman » 26 Apr 2018 18:26

Code: Select all

Unicode
The CMD Shell can redirect ASCII/ANSI (the default) or Unicode (UCS-2 le) but not UTF-8.
This can be selected by launching CMD /A or CMD /U

With the default settings a UCS-2 file can be converted by redirecting it (note it's the redirection not the TYPE/MORE command that makes the encoding change)
TYPE unicode.txt > asciifile.txt

European characters like ABCàéÿ will usually convert correctly, but others like £¥ƒ€ will become random extended ASCII characters: œ¾Ÿ?

carlos
Expert
Posts: 503
Joined: 20 Aug 2010 13:57
Location: Chile
Contact:

Re: Bug cmd windows 10 encoding utf-8 input redirection

#3 Post by carlos » 26 Apr 2018 19:11

I use the type command, and also for /f and all convert to the same "nandu" text.

Edit: I found that the problem was the font "Terminal", changing to font: "Lucida Console" show the correct characters.

Thus, seems that the syntax-redirection.html documentation should be updated, because, it accept redirection of utf-8 but the terminal font not display the characters correctly.

Post Reply