Page 1 of 2
JSORT.BAT v4.2 - problems with german umlauts
Posted: 29 Jul 2022 02:11
by Savion
Hello.
I have tested the nice script but jsort make problems with german umlauts.
If I convert a text with this:
Code: Select all
JSORT old.txt /p 12 /I /N /o new.txt
the new.txt has errors with German umlauts.
The old.txt is a ANSI file and this is the new.txt (ANSI) with errors.
- 1.jpg (52.96 KiB) Viewed 9796 times
If I convert the new.txt to UTF-8
- 2.jpg (54.31 KiB) Viewed 9796 times
Correct would be:
- 3.jpg (52.28 KiB) Viewed 9796 times
What is the right syntax for the export file?
I hope anyone can help me.
Thanks.
Re: JSORT.BAT v4.2 - problems with german umlauts
Posted: 29 Jul 2022 06:06
by aGerman
Without trying to reproduce the behavior yet - just an idea: What happens if you place a
before calling JSORT?
Steffen
Re: JSORT.BAT v4.2 - problems with german umlauts
Posted: 29 Jul 2022 13:22
by Savion
Hello Steffen.
I have tested.
Same error - All Umlauts are not right.
I have no idea more.
Another idea?
Re: JSORT.BAT v4.2 - problems with german umlauts
Posted: 30 Jul 2022 04:42
by aGerman
I'm afraid fixing this would require a refactoring of JSORT. E.g. reading and writing files in the JScript section instead of the Batch section of this hybrid script.
Steffen
Re: JSORT.BAT v4.2 - problems with german umlauts
Posted: 30 Jul 2022 07:15
by Savion
Thanks Steffen for your information.
Do you know a program / script that can correct umlauts?
Then I would let them run over the txt files.
Re: JSORT.BAT v4.2 - problems with german umlauts
Posted: 30 Jul 2022 08:15
by miskox
Can't SORT do the job? I see that only /N could be a problem. But if you don't need it... or you can rearrange the input file.
Saso
Re: JSORT.BAT v4.2 - problems with german umlauts
Posted: 30 Jul 2022 08:34
by Savion
Hello miskox.
Jsort can SORT. This is not the problem. Only the created new.txt with german umlauts is the problem.
Re: JSORT.BAT v4.2 - problems with german umlauts
Posted: 30 Jul 2022 09:37
by aGerman
I guess Saso refers to the
SORT command that ships with Windows anyways.
Steffen
Re: JSORT.BAT v4.2 - problems with german umlauts
Posted: 30 Jul 2022 11:07
by aGerman
I found a relatively simple fix:
Replace all occurrences of
WScript.Echo
with
WScript.StdOut.WriteLine
in JSORT.BAT
Steffen
Re: JSORT.BAT v4.2 - problems with german umlauts
Posted: 30 Jul 2022 12:17
by miskox
aGerman wrote: ↑30 Jul 2022 09:37
I guess Saso refers to the
SORT command that ships with Windows anyways.
Steffen
Yes Steffen, you are right SORT.exe that is part of the Windows OS.
Saso
Re: JSORT.BAT v4.2 - problems with german umlauts
Posted: 30 Jul 2022 21:10
by Savion
Steffen - YES THIS IS IT!
THANK's THANK's THANK's!
No problem more with umlauts.
Only replace
WScript.Echo
with
WScript.StdOut.WriteLine
Beautiful Sunday Steffen.
Re: JSORT.BAT v4.2 - problems with german umlauts
Posted: 31 Jul 2022 10:12
by miskox
Looks like Dave has some work to do.
Thanks Steffen.
Saso
Re: JSORT.BAT v4.2 - problems with german umlauts
Posted: 01 Aug 2022 07:26
by aGerman
Not sure if Dave will still be maintaining this script. I should probably add this workaround to the original topic since the last commenter in 2019 seemingly faced the same problem.
Steffen
Re: JSORT.BAT v4.2 - problems with german umlauts
Posted: 06 Aug 2022 05:59
by Sponge Belly
aGerman wrote:
Replace all occurrences of WScript.Echo with WScript.StdOut.WriteLine
Thanks for the tip, Steffen!
But can you explain why replacing WScript.Echo with WScript.StdOut.WriteLine solves the umlaut problem?
- SB
Re: JSORT.BAT v4.2 - problems with german umlauts
Posted: 06 Aug 2022 06:54
by aGerman
Quick investigation.
Script:
Code: Select all
@if (0)==(0) echo off
<%1 >CONOUT$ cscript //nologo //e:jscript "%~fs0"
pause
goto :eof @end
var ch = WScript.StdIn.ReadLine();
WScript.Echo(ch.charCodeAt(0).toString(16));
WScript.Echo(ch);
WScript.StdOut.WriteLine(ch);
Precondition for the output shown below:
ACP: 1252
OEMCP: 850
A test file containing only the byte 0xE9
Known character representation for byte 0xE9:
é in my ACP
Ú in my OEMCP
Output if the test file is dropped to the script:
Code: Select all
e9
é
Ú
Drücken Sie eine beliebige Taste . . .
Conclusion:
- WScript.StdIn.ReadLine reads byte 0xE9 without any charset conversion.
- WScript.Echo performs a conversion from ACP to OEMCP. The new value needs to be 0x82 to get represented as é in CP 850. Redirected to a file and interpreted in ACP byte 0x82 would be a "single low quotation mark" character. Can be proven by replacing CONOUT$ with a file name in the script.
- WScript.StdOut.WriteLine writes the original byte value through. It appears as Ú in CP 850. Redirected to a file and interpreted in ACP it would still be the é.
Steffen