Page 1 of 1

Notepad bug

Posted: 02 May 2023 15:39
by Squashman
Hi Everyone. Long time no see. Got a new job working in the Unix world so kind of gave up on the batch file stuff. But........today my old job called me because of an issue with a text file they were viewing in notepad.

The file was nothing but a single line. A trailer record in this case as the file had no customer data. A pure ASCII file.

Position 1 has a T followed by 9 spaces and then position 11 is a 0. It is just a trailer record that tells the data system that consumes the file how many records are supposed to be in the file. In today's file the client had zero records but we still have them send an empty file with the trailer record of zero.

Code: Select all

T         0 (followed by spaces all the way out to position 1116 and ending with CRLF)
The file was exactly 1118 bytes according to the file size and was showing the CRLF starting in position 1117.

The file displayed fine in every other text editor but not notepad but I finally figured out a pattern to what was causing notepad to not display the file correctly.
So if the file started with any character from hex 41 to 7E in the first position of the file, followed by 767 spaces, the file would not display correctly in notepad. Remove one space from the end of the file so that it is now only 767 bytes and the file displays correctly.

Starting the file with two alpha characters or numbers was fine.
A file with 768 spaces displays fine.
A file that started with a space, any printable character and spaces out to 768 displayed fine.

From what I can tell Notepad seems to think the file is UTF16-LE when it is in this format.

Running on Windows 10. Would be interesting to see if anyone can replicate it in Windows 7 or Windows 11. We replicated it on two Windows 10 computers at my company. I could probably test on Windows Servers as well but I stay away from them like the plague these days.

Re: Notepad bug

Posted: 02 May 2023 16:34
by ShadowThief
I can confirm it's present in Windows 11 as well.

Re: Notepad bug

Posted: 02 May 2023 23:22
by jeb
Hi,

I created the file with

Code: Select all

@echo off
setlocal enableDelayedExpansion
set "S= "
for /L %%n in (1,1,10) do set "S=!S!!S!"

> out.txt (
  <nul set /p ".=T!S:~0,767!"
)
notepad out.txt

Shows a file beginning with a character similar to an underline and 768 crosses, tested on Windows10

Re: Notepad bug

Posted: 03 May 2023 18:30
by penpen
At least under win10 notepad by default tries to guess the encoding (using whatever algorithm).
In your and jebs example, notepad indeed guesses wrong and assumes UTF16-LE.
But opening both explicitely in ANSI-encoding works fine.
So i won't call that behaviour a bug, at least under win10, though unexpected and unhandy.

Momentarily i can't check whether or not notepad offers selecting a specific encoding for other versions of windows.

Re: Notepad bug

Posted: 03 May 2023 19:04
by Squashman
penpen wrote:
03 May 2023 18:30
Momentarily i can't check whether or not notepad offers selecting a specific encoding for other versions of windows.
Wow. I never even knew that was an option in Notepad. I have always just opened up a text file by using my mouse. Interesting to know that it is just guessing the wrong encoding. I will pass that on to my old team.

Re: Notepad bug

Posted: 04 May 2023 06:16
by Aacini
Note also that if you create a new (empty) .txt (or .bat) file and open it with Notepad, the file is saved as UTF-8 by default. All my .BAT files are UTF-8. This usually don't matters until you want to insert certain special graphical characters on it. The cure is "Save as" the file with ANSI encoding...

Antonio

Re: Notepad bug

Posted: 04 May 2023 06:55
by miskox
https://answers.microsoft.com/en-us/win ... 1208e1b389

/A <filename> open file as ansi
/W <filename> open file as unicode
/P <filename> print filename
/PT <filename> <printername> <driverdll> <port> print filename to designated printer


Saso

Re: Notepad bug

Posted: 08 May 2023 09:32
by kwsiebert
Notepad in Windows 7 behaves this way as well, which I almost didn't expect, since I know there were changes between Windows 7 and 10 (regarding how Notepad handles Unix text files, in particular). Windows 7 Notepad does allow you to select the encoding when you open a file.
Aacini wrote:
04 May 2023 06:16
Note also that if you create a new (empty) .txt (or .bat) file and open it with Notepad, the file is saved as UTF-8 by default. All my .BAT files are UTF-8. This usually don't matters until you want to insert certain special graphical characters on it. The cure is "Save as" the file with ANSI encoding...

Antonio
This is familiar problem for me. I work with an order processing system that uses ANSI files for all of its input and output. If we accidentally provide a UTF-8 file for an order input, it just... doesn't load the order, no errors or warnings. Never buy a system from the lowest bidder, especially after your tech team tells you it's not sufficient for your needs.

Re: Notepad bug

Posted: 28 May 2023 09:38
by DOSadnie
Aacini wrote:
04 May 2023 06:16
[...]
if you create a new (empty) .txt (or .bat) file
[...]
file is saved as UTF-8 by default
[...]
This usually don't matters until you want to insert certain special graphical characters on it. The cure is "Save as" the file with ANSI encoding...
Or to add

Code: Select all

chcp 65001 >nul
at its beginning

Re: Notepad bug

Posted: 28 May 2023 23:54
by Aacini
DOSadnie wrote:
28 May 2023 09:38

. . .

Or to add

Code: Select all

chcp 65001 >nul
at its beginning
Mmmm... No... You have a confusion in a couple concepts...

The file (the disk Batch file, or any file for this case) is created by Notepad.exe with UTF-8 as default encoding always. This have no relation with the contents of the file. The UTF-8 encoding is set the first time the file is saved by Notepad.exe, not matters the file extension nor the file contents.

If the file is a Batch file and you include the chcp 65001 command at beginning then, when the Batch file is executed, the console mode is set to a mode compatible with the wide (Unicode) character set. This have no relation with the encoding of the Batch file, that remains the same (i.e. UTF-8) when the Batch file ends execution.

Antonio

Re: Notepad bug

Posted: 29 May 2023 02:13
by DOSadnie
DOSadnie wrote:
28 May 2023 09:38
[...]

Code: Select all

chcp 65001 >nul
[...]
Aacini wrote:
28 May 2023 23:54
Mmmm... No... You have a confusion in a couple concepts...
[...]
But is does comes down to the fact that although an UTF-8 encoding is eligible for a BAT file, the default code page used by the Command Prompt is 437, which is a non-UTF DOS OEM character set

And so without the above a script containing e.g. this

Code: Select all

@echo off
echo. █
pause
would not display the block character it contains but some gibberish substitute?