a little unicode related subtopic
Moderator: DosItHelp
a little unicode related subtopic
this is the part of the unicode discussion which started in the set /a thread, moved from the set /a thread to here.
Last edited by taripo on 13 Dec 2011 15:48, edited 1 time in total.
Re: a little unicode related subtopic
Ed Dyreen wrote:'
...
for /f "usebackq tokens=1-3 delims=¦" %%b in ( '"1"¦"4"¦"RandomNumber"' ) do %Get.RandomNumber.TokenSTR% %()% >nul
call :LABEL%RandomNumber%
I see if I try to copy/paste ¦ into a cmd prompt, I get a white mark, it's the ANSI broken bar \xA6.
If I put that in a batch file, and TYPE blah.bat it shows ª (the IBM Extended ASCII symbol \xA6).
Why not write ª in the code? I guess that's what is executed when you run a batch file with that..
Do you have ¦ on your keyboard mapping to broken bar, so it's convenient to type, but if you do, how do you then type a proper pipe | ? Does your keyboard have both characters?
Your batch file works..
But if I try to get something simple to work.. That delimiter doesn't work for me as a delimiter. Comma does.. but ¦ doesn't
notepad fff.bat displays
for /f "delims=¦" %%f in (a¦b) do echo %%f
for /f "delims=," %%f in (a,b) do echo %%f
C:\gaa>type fff.bat
for /f "delims=ª" %%f in (aªb) do echo %%f
for /f "delims=," %%f in (a,b) do echo %%f
C:\gaa>echo abc>a
C:\gaa>echo def>b
C:\gaa>fff
C:\gaa>for /F "delims=ª" %f in (aªb) do echo %f
The system cannot find the file aªb.
C:\gaa>for /F "delims=," %f in (a b) do echo %f
C:\gaa>echo abc
abc
C:\gaa>echo def
def
C:\gaa>
Re: a little unicode related subtopic
Taripo wrote: If I put that in a batch file, and TYPE blah.bat it shows ª
Ed wrote: I have no clue, I don't have that problem You can use whatever delimiter you like
Ed wrote: I have no clue, I don't have that problem You can use whatever delimiter you like
Code: Select all
C:\PROFSYS\ADMIN>prompt $
@echo off
for /f "tokens=1-26 delims=¦" %a in ( "this¦works" ) do echo.a=%~a_ &echo.b=%~b_
a=this_
b=works_
Re: a little unicode related subtopic
what version of windows?
i'm on XP.
This pic shows the straight pipe, and the broken bar.. As they appear, in a webpage in chrome, or in notepad.
here is a link to fff.bat
http://ge.tt/9ApVvRA
this shows fff.bat type, and ran, and in notepad
i'm on XP.
This pic shows the straight pipe, and the broken bar.. As they appear, in a webpage in chrome, or in notepad.
here is a link to fff.bat
http://ge.tt/9ApVvRA
this shows fff.bat type, and ran, and in notepad
Re: a little unicode related subtopic
I have just replaced the delimiter in your script.. and it (still) works (this time without the weirdness).. Before, in your script, I tried replacing the delimiter with , and it failed. But replacing the delimiter with _ worked.
Not sure what's happening though with the | and ¦ though.
What do your ones look like, one straight pipe, one broken bar? you see my screenshots.
Not sure what's happening though with the | and ¦ though.
What do your ones look like, one straight pipe, one broken bar? you see my screenshots.
Re: a little unicode related subtopic
Ed wrote: hmm, do you really need command.COM
Try .CMD for batch extension...
Tarip wrote
.cmd looks exactly the same.
I included a link to the file - 3 posts up.
I am curious if you get a different result from exactly the same batch file.
It works though.
And the main thing, is your script works too, and without weirdness since I changed the delimiter to _
Ed wrote:
I get a decimal value 166 and 167 from 0xA6, and 0xA7, however ¦ is dec 221 = 0xDD
Try .CMD for batch extension...
Tarip wrote
.cmd looks exactly the same.
I included a link to the file - 3 posts up.
I am curious if you get a different result from exactly the same batch file.
It works though.
And the main thing, is your script works too, and without weirdness since I changed the delimiter to _
Ed wrote:
I get a decimal value 166 and 167 from 0xA6, and 0xA7, however ¦ is dec 221 = 0xDD
Re: a little unicode related subtopic
Ed Dyreen wrote:'
I get a decimal value 166 and 167 from 0xA6, and 0xA7, however ¦ is dec 221 = 0xDD
166(decimal) is 0xA6 hexadecimal e.t.c. So it'd be that for anybody.
¦ is not 0xDD. ▌ is 0xDD
My CMD cannot print ¦ But I can try to paste it in manually to the console.
It doesn't exist in IBM Extended ASCII, and it maps to another thing
it prints ▌ and ▌ is 0xDD and it works as a delimiter.
C:\gaa>for /F "tokens=1-2 delims=▌" %f in ("a▌b") do @echo %f - %g
a - b
My CMD prompt displays IBM Extended ASCII.. It sees 0xA6, doesn't know it's a broken bar, and it displays what is 0xA6 in Extended ASCII which is
http://www.jimprice.com/ascii-dos.gif
row A, column 6
also here 0xA6
http://ascii-table.com/ascii-extended-pc-list.php
it's the little a thing.
What version of Windows are you using? It looks like
What text editor are you using,
I am using Notepad, and saving in ANSI. It has a broken bar 'cos ANSI does. And it saves it as 0xA6, When CMD prompt tries to execute a bat file with 0xA6, or TYPE a bat file with 0xA6, it think it is a little 'a' thing. As you see, 0xA6 in Extended ASCII.
Re: a little unicode related subtopic
That 0xDD character is not the pipe | or ¦
It doesn't work as a pipe.
C:\WINDOWS>dir ▌ more
Volume in drive C has no label.
Volume Serial Number is FC9D-4769
Directory of C:\WINDOWS
Directory of C:\WINDOWS
File Not Found
C:\WINDOWS>
It sounds like maybe you type ¦ and it comes out as ▌ in your bat file.
But then what would you type for a pipe to come out?
What are you using to write your bat file?
It doesn't work as a pipe.
C:\WINDOWS>dir ▌ more
Volume in drive C has no label.
Volume Serial Number is FC9D-4769
Directory of C:\WINDOWS
Directory of C:\WINDOWS
File Not Found
C:\WINDOWS>
It sounds like maybe you type ¦ and it comes out as ▌ in your bat file.
But then what would you type for a pipe to come out?
What are you using to write your bat file?
Re: a little unicode related subtopic
It depends on your locale settings which codepages are used. You will find them in the registry.
That code returns for my German settings:
Means codepage 850 (ASCII) and codepage 1252 (ANSI).
If you save your code in ANSI it is however interpreted in ASCII in your command window. For that reason some characters are not displayed in the same manner.
E.g. Hex 0xA9 represents character © in codepage 1252 but character ® in codepage 850.
(ref. http://en.wikipedia.org/wiki/Code_page_850, http://en.wikipedia.org/wiki/Windows-1252)
BTW: That discussion is a bit off topic in a "SET /A" thread, isn't it
Regards
aGerman
Code: Select all
@echo off &setlocal
for /f "tokens=2*" %%i in ('reg query "HKLM\SYSTEM\CurrentControlSet\Control\Nls\CodePage" /v "OEMCP"') do set /a OEMCP=%%j
for /f "tokens=2*" %%i in ('reg query "HKLM\SYSTEM\CurrentControlSet\Control\Nls\CodePage" /v "ACP"') do set /a ACP=%%j
set OEMCP
set ACP
pause>nul
That code returns for my German settings:
Code: Select all
OEMCP=850
ACP=1252
Means codepage 850 (ASCII) and codepage 1252 (ANSI).
If you save your code in ANSI it is however interpreted in ASCII in your command window. For that reason some characters are not displayed in the same manner.
E.g. Hex 0xA9 represents character © in codepage 1252 but character ® in codepage 850.
(ref. http://en.wikipedia.org/wiki/Code_page_850, http://en.wikipedia.org/wiki/Windows-1252)
BTW: That discussion is a bit off topic in a "SET /A" thread, isn't it
Regards
aGerman
Re: a little unicode related subtopic
'
Taripo reported codepage problems in this topic
viewtopic.php?f=3&t=1817&start=0&hilit=code+page+850
And took the discussion to a new level here
viewtopic.php?f=3&t=2550&start=0
To be very clear, this character as you see it '¦' I use as data delimiter, in my case this is safe as it should never occur in any delimited data. I use codepage 850, and my batch has no problems with it at all !
if I
Will my batch then finally work for people with different codepages
Taripo reported codepage problems in this topic
viewtopic.php?f=3&t=1817&start=0&hilit=code+page+850
And took the discussion to a new level here
viewtopic.php?f=3&t=2550&start=0
To be very clear, this character as you see it '¦' I use as data delimiter, in my case this is safe as it should never occur in any delimited data. I use codepage 850, and my batch has no problems with it at all !
Code: Select all
for /f "usebackq tokens=1-3 delims=¦" %%b in ( '"MinimumSTR"¦"MaximumSTR"¦"StoreVAR"' ) do
Code: Select all
chcp 850
Re: a little unicode related subtopic
all the unicode related posts in that set /a thread you link to, are now in this thread I pasted them in 'cos they were a bit off topic within that set /a thread.
Also,
here i've pasted your code into the cmd prompt..
here are two command prompts, one with lucida console, one with raster fonts.
It might not make any difference, but which font are you using, and do you also get the display that picture shows when you paste it in the command prompt?
Here are images for when I paste it into notepad save it then I do TYPE on it.
note- though agerman mentioned about 2 places of setting codepages, whether that makes a difference to the above I don't know, probably not.. though it does matter for some things i'd have to check back to what he showed me there. 'cos the posts show about it.
Also I didn't save it as unicode.. which I suppose I should have...
Are you also using the unicode switch on cmd.com ? that's another thing agerman mentioned.. In the screenshots i've done I haven't done it with that. I haven't really tried the two codepage settings or the unicode switch for a while, only the last time it was discussed, though I may look back at it.
Here is the file saved as unicode (as opposed to ANSI which is notepad's default)
and that big thick bar thing is funny business 'cos it's not even the broken pipe character, it's 0xdd so a broken pipe pasted in when it's set to raster font gets converted to that.
Also,
here i've pasted your code into the cmd prompt..
here are two command prompts, one with lucida console, one with raster fonts.
It might not make any difference, but which font are you using, and do you also get the display that picture shows when you paste it in the command prompt?
Here are images for when I paste it into notepad save it then I do TYPE on it.
note- though agerman mentioned about 2 places of setting codepages, whether that makes a difference to the above I don't know, probably not.. though it does matter for some things i'd have to check back to what he showed me there. 'cos the posts show about it.
Also I didn't save it as unicode.. which I suppose I should have...
Are you also using the unicode switch on cmd.com ? that's another thing agerman mentioned.. In the screenshots i've done I haven't done it with that. I haven't really tried the two codepage settings or the unicode switch for a while, only the last time it was discussed, though I may look back at it.
Here is the file saved as unicode (as opposed to ANSI which is notepad's default)
and that big thick bar thing is funny business 'cos it's not even the broken pipe character, it's 0xdd so a broken pipe pasted in when it's set to raster font gets converted to that.
Re: a little unicode related subtopic
'
Hi Taripo, what a coincidence
Well it looks like a small a with an underscore 'a', but that is just how it displays.
I only care if the code is affected and whether forcing codepage 850 makes my code work.
You reported it didn't on your OS with your codepage and that you solved it with an underscore.
I want to make it work for everyone with this '¦' delimiter.
Hi Taripo, what a coincidence
Well it looks like a small a with an underscore 'a', but that is just how it displays.
I only care if the code is affected and whether forcing codepage 850 makes my code work.
You reported it didn't on your OS with your codepage and that you solved it with an underscore.
I want to make it work for everyone with this '¦' delimiter.
Re: a little unicode related subtopic
by the way Ed, until you made your post, this thread was continuing here
encodings
viewtopic.php?f=3&t=2550
encodings
viewtopic.php?f=3&t=2550
Re: a little unicode related subtopic
Ed Dyreen wrote:'
Taripo reported codepage problems in this topic [...]Aacini wrote:My PIPE.COM program is only 69 bytes in size, so it is a very good replacement of FINDSTR to be used in these cases. If you are worried about where to get my program from, here it is; just copy the 69 bytes below to a file named PIPE.COM and it is ready to run:
ë2´)€ì!Í!ŠÐŠà€Ä!€ü.u.€þ+u)R²A€ê!´#€ì!Í!Z´#€ì!Í!Šò€Æ!´,€ì!Í!"ÀuôLÍ!ëã
I also read about the .COM files that can be created with batch, but this requires me to save the batch as unicode otherwise I'll get a warning of unsupported characters , but if I do that my batch scripts won't execute ! Only if I save them as ansi !
So how could that ever work ?
Haven't followed the old thread, but as far as copy/pasting extended characters into .com executables also consider viewtopic.php?p=12950#p12950 or, more to the point, do _not_ do it unless you completely positively understand the mechanics and possible pitfalls. You certainly do _not_ want to save it as unicode, and saving as "ansi" may or may not work depending on your vs. the original poster's codepage settings and chosen editor behaviors.
Cheers,
Liviu