issue with IF statement and weird character

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
ayce
Posts: 9
Joined: 27 Dec 2021 17:15

Re: issue with IF statement and weird character

#16 Post by ayce » 29 Dec 2021 13:52

Compo wrote:
28 Dec 2021 12:11
Just be aware everyone that the OP specifically stated
ayce wrote:
27 Dec 2021 17:36
I can type it in the command
Which means that they do know what the character is, and how to type it, and there is therefore no reason for them not to provide that information.
There's a flaw in the theory, which is the fact I'm faced with the character as a result of a process I do not have control over. The character is ONLY in the name of the file.

What I'm saying is, that there is code that is storing the character as a variable. Then it does tasks with it, but depending on what they do, it fails, like an IF statement for example. I'm sure other commands will fail as well. The character however, is in the variable.

If you copy-paste that character (in the CMD box), that character (and ONLY that character) is removed from the Windows copy buffer.
Last edited by ayce on 29 Dec 2021 13:56, edited 1 time in total.

ayce
Posts: 9
Joined: 27 Dec 2021 17:15

Re: issue with IF statement and weird character

#17 Post by ayce » 29 Dec 2021 14:12

My config is this:
Active code page: 1252
Unless I'm wrong, that's pretty default...

So, I'm somewhat further with analysis now. First of all : the character is meaningless, so I do't care too much what it is, and what it is meant for, my goal is to actually remove the character.

But, to do that, I MUST use a variable containing that bad character, otherwise it will be hard to rename a file, if I can't tell the system what the file is named. That is, the name WITH the character.

So, the bug/feature (whatever suits you) is that the character gets lost:
- when you copy paste it from the CMD screen, to anything
- when you redirect to a file (.CMD file, .TXT file, whatever)

Also, I noted that in Windows Explorer, the character is NOT shown. I earlier stated it is shown, but it is NOT shown in Windows Explorer.
So that means that Windows Explorer is also removing the character when it displays the name of the file. As a result, it thus displays the wrong file name.

Now, CMD displays the name correct, if you use DIR or such.
Also, and that is what the code is doing, it uses it correctly if I use it in a FOR loop. I can get that one character, and put it into a variable. I can show the content of the variable, and it would always show the correct content. ( Which, as stated earlier looks like a small flag on a pole, that's my best description).

So, the thing is that construct a simple command to change the name of the file: from the real name (with that character), to the same name, but excluding the characters I don't want, which would be just that one character here (but the code would also allow removal of any character I want). But, definately removal of useless, buggy characters is a plus. Biggest problem is that the character gets removed from the parsed result by some commands, like if.

So, if I do this:

IF %my_character%+==+ echo K

... it technically works, but the result is unreliable as my character can also be a whitespace.
I know that I should be using double quotes, but also when using double quotes, the buggy/feature behaviour is the same: the character just disappears. When it gets parsed, the character is just removed.

And, when I redirect my variable to a file

echo %my_character% > test.txt

The content of that file is, just this one character:

?

So, the problem is double:
- character gets lost while doing certain manipulation in CMD
- character gets translated incorrectly, when redirected to a file

ayce
Posts: 9
Joined: 27 Dec 2021 17:15

Re: issue with IF statement and weird character

#18 Post by ayce » 29 Dec 2021 14:22

Compo wrote:
28 Dec 2021 09:22
As I initially stated, and also mentioned by penpen above, it may simply be a chosen codepage issue.

I'd suspect your actual character is most likely an extended ASCII upper or lower case i with a grave, or acute. So in codepage 1252 that would be dec 204 and 205 (hex CC and CD), or dec 236 and 237 (hex EC and ED). Examples:

Code: Select all

Ì ì Í í
Please be aware that in the used font here, and in your console, the upper case version may, or may not, include a horizontal top and base, and the lower may or may not include a horizontal base.
A hex conversion (using cygwin OD.exe) tells me the character is 3F

... which is:

?

So, ... that fails. But that is because it's not an ASCII character originally. What it is ...

Keep in mind, this is a name of a file. It cannot hold ? as a character in its name

Compo
Posts: 600
Joined: 21 Mar 2014 08:50

Re: issue with IF statement and weird character

#19 Post by Compo » 29 Dec 2021 14:50

ayce wrote:
29 Dec 2021 13:52
Compo wrote:
28 Dec 2021 12:11
Just be aware everyone that the OP specifically stated
ayce wrote:
27 Dec 2021 17:36
I can type it in the command
Which means that they do know what the character is, and how to type it, and there is therefore no reason for them not to provide that information.
There's a flaw in the theory, which is the fact I'm faced with the character as a result of a process I do not have control over.
No my theory was that you said you could type it, when clearly you cannot!

Aside from that, if you want to remove characters outside of the ASCII range, perhaps using PowerShell may help. Please take a look at this answer to a question looking to do that with strings, as it may be of assistance to you.

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: issue with IF statement and weird character

#20 Post by Aacini » 29 Dec 2021 15:09

ayce wrote:
29 Dec 2021 13:50
Aacini wrote:
28 Dec 2021 11:31
ayce wrote:
28 Dec 2021 05:17


I can't paste it because the internal Windows text copy feature also removes it from the text being pasted
Do the following:
  1. From the cmd.exe command-line, enter: dir > output.txt
  2. Open output.txt, Select all, Copy text
  3. Below this reply, post a new reply.
  4. Enter left-square-braquet+code+right-square-braquet in a line, then paste the text below it, and terminate with left-square-braquet+/code+right-square-braquet
  5. Post the reply...
If you redirect to a file, he converts the character to a ?
So, step 1 fails

Now, that is interesting, because the string itself is a name of a file, and as you know, a ? is not an allowed character in the name of a file. So, also redirecting messes up the encondig of the character, like does the IF statement.


Then try:

Code: Select all

cmd /U /C dir ^> output.txt
This creates output.txt as an UTF-16 LE Unicode file...

aGerman
Expert
Posts: 4678
Joined: 22 Jan 2010 18:01
Location: Germany

Re: issue with IF statement and weird character

#21 Post by aGerman » 30 Dec 2021 07:18

I updated the code in post #10 in order to handle names with characters >0xff.
For a file with name "a♫b.txt" it writes

Code: Select all

           00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000   61 00 6B 26 62 00 2E 00 74 00 78 00 74 00        a.k&b...t.x.t.
The code point "Beamed Eighth Notes" has value U+266B. Because of the Little Endian byte order the bytes representing it are 6B 26.
Now you should really be able to tell us what character you are talking about.

Steffen

Compo
Posts: 600
Joined: 21 Mar 2014 08:50

Re: issue with IF statement and weird character

#22 Post by Compo » 31 Dec 2021 06:37

Here is my analysis of your issue thus far:

You are running some code which you've not shown us, within a FOR loop, and which outputs filenames to a variable.
ayce wrote:
29 Dec 2021 13:52
I'm faced with the character as a result of a process I do not have control over. The character is ONLY in the name of the file.

What I'm saying is, that there is code that is storing the character as a variable. Then it does tasks with it,
ayce wrote:
29 Dec 2021 14:12
CMD displays the name correct, if you use DIR or such.
Also, and that is what the code is doing, it uses it correctly if I use it in a FOR loop. I can get that one character, and put it into a variable. I can show the content of the variable, and it would always show the correct content.
Then you want to parse this filename variable to remove one or more characters from it, i.e. rename the file.
ayce wrote:
28 Dec 2021 07:53
The issue is not just this character, it's an issue with all these kind of characters. To my knowledge, these are printable characters, outside of ASCII. I don't need to do anything with these characters, other than removing them.
ayce wrote:
28 Dec 2021 08:02
The one character I'm analyzing can also be just one double quote, so I must do extra effort there anyway.
No it cannot, a double quote is invalid in a file or directory name!
ayce wrote:
29 Dec 2021 14:12
my character can also be a whitespace.
ayce wrote:
29 Dec 2021 14:12
First of all : the character is meaningless, so I do't care too much what it is, and what it is meant for, my goal is to actually remove the character.

But, to do that, I MUST use a variable containing that bad character, otherwise it will be hard to rename a file, if I can't tell the system what the file is named. That is, the name WITH the character.
To do that you are performing IF / IF NOT comparisons using those characters individually, one at a time.
ayce wrote:
27 Dec 2021 17:36
I'm testing some code to handle 1-character checks and functions in a CMD script, and I stumbled on this:

C:\> if X+==+ echo K
K
C:\>

BUT - I changed the command slightly, in the way that the X is a special character, not just the letter X.
ayce wrote:
28 Dec 2021 07:46
The issue actually is not the IF command, the issue is that a character gets lots when being parsed, for example in an IF command.
I therefore stated:
Compo wrote:
28 Dec 2021 12:11
IMO, I would suggest that this is not a specific issue, and the OP is hoping that somebody will come along with a solution to handle every single possible character, without any restrictions, regardless of language, locale, character set, or encoding etc.
and you confirmed that.
ayce wrote:
29 Dec 2021 14:12
So, the thing is that construct a simple command to change the name of the file: from the real name (with that character), to the same name, but excluding the characters I don't want, which would be just that one character here (but the code would also allow removal of any character I want). But, definately removal of useless, buggy characters is a plus.
Could you therefore please post all of your code, so that we can determine a methodolgy to work with it, or around it, whilst still achieving your intended goal. Along side it, at least an image or image link showing some of those actual filenames, by way of one or more screenshots.

We are not here to provide you with a fully working scripted solution you can run against every single illegally downloaded video and/or music file to rename every file to your specific/personal requirements.
We are happy to help, but not to do everything for you.

Post Reply