I'm still trying to improve the automatic support for multiple text file encodings in my System Tools library.
Two things need improvements now:
- The new Windows Terminal can now display all Unicode characters, including those beyond the 16-bits plane 0.
I thought that my method of using UTF-16 for output to the console would automatically support that... But it does not
On the other hand, when (and only when) the current code page is 65001, writing UTF-8 text in binary mode works. Even even with emoticons in plane 1, like "" = "\U1F44D" = "\xF0\x9F\x91\x8D".
If anybody has ideas on why UTF-16 does not work (Ex "" = "\U1F44D" = "\uD83D\uDC4D") I'm interested!
- I already had support for files both in the Windows system encoding (Ex: Code page 1252 in US versions of Windows), and in the UTF-8 encoding.
I'm now adding code for distinguishing between the above two, and also ASCII, UTF-16, UTF-32, and binary files.
I've tested it on my own system (A US version of Windows, despite my living in France) with good results.
I'd appreciate if other forum users could give it a try on their own text files, and tell me if the results are correct.
Especially if they have non-US versions of Windows! And even more so if it's a non-latin script!
Encoding.exe will eventually be included in future releases of my System Tools library.
For now, I've put a beta version there: http://jf.larvoire.free.fr/progs/encoding.exe
Usage:
encoding [OPTIONS] [PATHNAME [...]]
It also supports wildcards. Ex:
Code: Select all
C:\JFL\Temp>encoding t*.txt
UTF-8 t.txt
ASCII t0.txt
ASCII t1.txt
UTF-16 t16.txt
UTF-8 t2.txt
ASCII t20.txt
UTF-8 t3.txt
Windows t4.txt
ASCII t5.txt
ASCII t6.txt
ASCII t7.txt
ASCII t8.txt
ASCII tab.txt
ASCII tab2.txt
ASCII tabs.txt
UTF-8 tb.bat .txt
ASCII temp.txt
Windows test.txt
Windows test1.txt
UTF-16 test16.txt
ASCII test2.txt
UTF-8 test8.txt
UTF-8 test_u8.txt
ASCII tW.txt
C:\JFL\Temp>dump t1.txt
Offset 00 04 08 0C 0 4 8 C
-------- ----------- ----------- ----------- ----------- -------- --------
00000000 66 69 72 73 74 0D 0A first
C:\JFL\Temp>dump t2.txt 0 40
Offset 00 04 08 0C 0 4 8 C
-------- ----------- ----------- ----------- ----------- -------- --------
00000000 4C 65 20 70 61 73 73 61 67 65 20 64 65 20 6C 61 Le passa ge de la
00000010 20 53 61 76 6F 79 61 72 64 65 0D 0A 6C 61 20 6C Savoyar de la l
00000020 61 0D 0A 0D 0A 43 27 65 73 74 20 6C 65 20 70 6F a C'e st le po
00000030 69 6E 74 20 63 6C C3 A9 20 64 65 20 6C 61 20 74 int cl�� de la t
C:\JFL\Temp>dump t16.txt
Offset 00 04 08 0C 0 4 8 C
-------- ----------- ----------- ----------- ----------- -------- --------
00000000 FF FE 41 00 3D D8 09 DE 42 00 ��A =� � B
C:\JFL\Temp>
Jean-François