Char/String Compare

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
einstein1969
Expert
Posts: 960
Joined: 15 Jun 2012 13:16
Location: Italy, Rome

Char/String Compare

#1 Post by einstein1969 » 02 Oct 2013 11:39

Hi,

How work the IF with char ?

Code: Select all

@echo off & setlocal EnableDelayedExpansion

   chcp 850

   call :test

   chcp 437

   call :test

goto :eof

:test

   call :check "A" "A"

   call :check "A" "B"

   call :check "A" "a"

   call :check "A" "b"

   call :check "A" "í"

   call :check "A" "î"

   call :check "A" "~"

   call :check "A" "€"

goto :eof

:check s1 s2

   set "s1=%~1"
   set "s2=%~2"


   if !s1! gtr !s2! (echo "!s1!" gtr "!s2!"
      ) else if !s1! lss !s2! (echo "!s1!" lss "!s2!"
         ) else if !s1! equ !s2! (echo "!s1!" = "!s2!"
            ) else (echo ?)

goto :eof


Result:

Code: Select all

E:\x264\provini>tmp2
Tabella codici attiva: 850
"A" = "A"
"A" lss "B"
"A" gtr "a"
"A" lss "b"
"A" lss "Ý"
"A" gtr "¯"
"A" gtr "~"
"A" lss "Ç"
Tabella codici attiva: 437
"A" = "A"
"A" lss "B"
"A" gtr "a"
"A" lss "b"
"A" lss "φ"
"A" lss "ε"
"A" gtr "~"
"A" lss "Ç"


Einstein1969

penpen
Expert
Posts: 2009
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Char/String Compare

#2 Post by penpen » 02 Oct 2013 15:42

This is not easy to say in common.
You may compute an order matrix using this batch script, but you need an additional tool to create the binary "ascii.dat":

Code: Select all

@echo off
cls
setlocal enableDelayedExpansion

(
   set /P "ascii="
) < "ascii.dat"
set "hexDigits=0123456789ABCDEF"

rem echo("!ascii:~255,1!"


(
   for /L %%h in (0,1,15) do (
   for /L %%l in (0,1,15) do (
      set "a=0x!hexDigits:~%%h,1!!hexDigits:~%%l,1!"
      for %%a in (!a!) do set c=!ascii:~%%a,1!

      if "%%h%%l" == "09" (
         set "line=!a!  |"
      ) else (
         set "line=!a! !c!|"
      )

      if "%%h%%l" == "00" (
         set "line0=      |"
         set "linex=      |"
         set "lineH=      |"
         set "lineL=      |"
         set "lineASCII=      |"
         set "lineSpace=      |"
         set "line=------+"

         for /L %%H in (0,1,15) do (
         for /L %%L in (0,1,15) do (
            if NOT "%%H%%L" == "00" (
               set "b=0x!hexDigits:~%%H,1!!hexDigits:~%%L,1!"
               for %%b in (!b!) do set d=!ascii:~%%b,1!

               set "line0=!line0!0"
               set "linex=!linex!x"
               set "lineH=!lineH!!hexDigits:~%%H,1!"
               set "lineL=!lineL!!hexDigits:~%%L,1!"

               if "%%h%%l" == "09" (
                  set "lineASCII=!lineASCII! "
               ) else (
                  set "lineASCII=!lineASCII!!d!"
               )

               set "lineSpace=!lineSpace! "
               set "line=!line!-"
            )
         )
         )

         echo(!line0!
         echo(!linex!
         echo(!lineH!
         echo(!lineL!
         echo(!lineSpace!
         echo(!lineASCII!
         echo(!line!

         set "line0="
         set "linex="
         set "lineH="
         set "lineL="
         set "lineSpace="
         set "lineASCII="
         set "line="
      ) else (
         for /L %%H in (0,1,15) do (
         for /L %%L in (0,1,15) do (
            if NOT "%%H%%L" == "00" (
               set "b=0x!hexDigits:~%%H,1!!hexDigits:~%%L,1!"
               for %%b in (!b!) do set d=!ascii:~%%b,1!

               if !c! LSS !d! (
                  set "line=!line!-"
               ) else if !c! GTR !d! (
                  set "line=!line!+"
               ) else if !c! EQU !d! (
                  set "line=!line!0"
               ) else (
                  set "line=!line!_"
               )
            )
         )
         )
         echo(!line!
      )
   )
   )

) > "ASCII.txt"

endlocal
goto :eof
Mabe one other here knows how to create it easily; if not you need an hexeditor and create it by hand.
It should contain a space followed by the bytes 0x01 to 0xFF in increasing order.

The resulting ASCII.txt contains such a matrix:
- an entry is + if left char is greater than the upper char,
- an entry is - if left char is lesser than the upper char,
- an entry is 0 if left char is equal to the upper char.

The matrix is too big to post it as is, so sorry for such a complicate way.

penpen

penpen
Expert
Posts: 2009
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Char/String Compare

#3 Post by penpen » 02 Oct 2013 17:34

If .NET is installed on your system then this may compile an executable that creates the needed ascii.dat file:

Code: Select all

// // >nul 2> nul & @goto :main
/*
:main
   @echo off
   setlocal
   cls

   set "csc="

   pushd "%SystemRoot%\Microsoft.NET\Framework"
   for /f "tokens=* delims=" %%i in ('dir /b /o:n "v*"') do (
      dir /a-d /b "%%~fi\csc.exe" >nul 2>&1 && set "csc="%%~fi\csc.exe""
   )
   popd

   if defined csc (
      echo most recent C#.NET compiler located in:
      echo %csc%.
   ) else (
      echo C#.NET compiler not found.
      goto :eof
   )

   
   %csc% /nologo /target:exe /out:"%~dpn0.exe" "%~0"
   goto :eof
*/
using System;
using System.IO;



class AsciiFile {
   public static int Main (string [] args) {
      byte [] buffer = new byte [256];

      buffer [0] = (byte) (32 & 0x000000FF);
      for (int i = 1; i < buffer.Length; ++i) {
         buffer [i] = (byte) (i & 0x000000FF);
      }

      try {
         using (FileStream fs = System.IO.File.Create ("ascii.dat")) {
                 fs.Write (buffer, 0, buffer.Length);
         }

         return 0;
      } catch (ArgumentNullException) {
         System.Console.WriteLine ("Buffer fail: Is null.");
      } catch (IOException) {
         System.Console.WriteLine ("Write fail: IOException.");
      } catch (ObjectDisposedException) {
         System.Console.WriteLine ("Write fail: DisposedException");
      }

      return 1;
   }
}

penpen

einstein1969
Expert
Posts: 960
Joined: 15 Jun 2012 13:16
Location: Italy, Rome

Re: Char/String Compare

#4 Post by einstein1969 » 02 Oct 2013 19:16

Thanks for replay penpen

For now i have created the ascii.dat, I don't use .NET

Code: Select all

@echo off & setlocal EnableDelayedExpansion

:: Make an ASCII.DAT , Seven 32bit
::
:: Makeascii Filename 00_subst

   If "%~1"=="" (set "N=ASCII.DAT") else set "N=%~1"
   If exist "!N!" (echo File already exist. & goto :eof)
   if "%~2"=="" (set "F=00") else set "F=%~2"

   if Not "%3"=="" goto :%3

   call %0 "!N!" "!F!" exec | debug > nul

goto :eof

:exec

   set "E= 0 1 2 3 4 5 6 7 8 9 A B C D E F"
   For %%E in (!E!) do echo(E1%%E0!E: = %%E!
   for %%N in (N!N! RCX 100 E100 !F! W Q) do echo(%%N

goto :eof


EDIT: Smallest version
EDIT2:Added possibility to change the first byte of the sequence . On seven the space don't work in set /p?


Einstein1969
Last edited by einstein1969 on 03 Oct 2013 12:00, edited 3 times in total.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Char/String Compare

#5 Post by dbenham » 02 Oct 2013 22:10

Many moons ago, I wrote a batch library of routines called CharLib.bat to assist with processing characters within batch. Development of the utility is documented at new functions: :chr, :asc, :asciiMap. There are some characters embedded within the code that cause encoding problems when posted on forum sites like this. So I posted the file on a free Google site. The following link will download a file named CharLib_bat.txt: https://sites.google.com/site/dbenhamfi ... ib_bat.txt

Rename the file to CharLib.bat, and it is ready to use. Documentation is built into the script.

Once you have CharLib.bat, then the following batch script can be used to generate a list of 255 extended ASCII characters, sorted in the collation sequence of a given code page. The script takes a code page number as the one and only parameter. It creates a file named sorted_nnn.txt, where nnn is the code page. It also prints the result to the screen.

Code: Select all

@echo off
setlocal enableDelayedExpansion
chcp %1

set "ascii=."
for /l %%N in (1 1 255) do (
  cls
  echo %%N
  call charlib chr %%N c
  set "ascii=!ascii!!c!"
)

for /l %%C in (1 1 255) do (
  cls
  echo %%C
  set "V=00%%C"
  set "V=!V:~-3!"
  set "$!V!=1000"
  for /l %%N in (1 1 255) do if "!ascii:~%%C,1!" gtr "!ascii:~%%N,1!" set /a $!V!+=1
)
cls
call :result >sorted_%1.txt
<sorted_%1.txt findstr "^"
exit /b

:result
chcp
for /f "delims=$=" %%N in ('set $^|sort /+6') do (
  for /f "delims=0 tokens=*" %%n in ("%%N") do (
    echo  chr(%%N^) = [!ascii:~%%n,1!]
  )
)

Here is the sorted output for code page 437. I'm not sure if all the characters post properly to the forum:

Code: Select all

Active code page: 437
 chr(032) = [ ]
 chr(255) = [ ]
 chr(009) = [   ]
 chr(010) = [
]
 chr(011) = [♂]
 chr(012) = [♀]
]chr(013) = [
 chr(033) = [!]
 chr(001) = [☺]
 chr(002) = [☻]
 chr(003) = [♥]
 chr(004) = [♦]
 chr(005) = [♣]
 chr(006) = [♠]
 chr(007) = []
 chr(008) = ]
 chr(014) = [♫]
 chr(015) = [☼]
 chr(016) = [►]
 chr(017) = [◄]
 chr(018) = [↕]
 chr(019) = [‼]
 chr(020) = [¶]
 chr(021) = [§]
 chr(022) = [▬]
 chr(023) = [↨]
 chr(024) = [↑]
 chr(025) = [↓]
 chr(026) = [→]
 chr(027) = [←]
 chr(028) = [∟]
 chr(029) = [↔]
 chr(030) = [▲]
 chr(031) = [▼]
 chr(127) = [⌂]
 chr(039) = [']
 chr(045) = [-]
 chr(034) = ["]
 chr(035) = [#]
 chr(036) = [$]
 chr(037) = [%]
 chr(038) = [&]
 chr(040) = [(]
 chr(041) = [)]
 chr(042) = [*]
 chr(044) = [,]
 chr(046) = [.]
 chr(047) = [/]
 chr(058) = [:]
 chr(059) = [;]
 chr(063) = [?]
 chr(064) = [@]
 chr(091) = [[]
 chr(092) = [\]
 chr(093) = []]
 chr(094) = [^]
 chr(095) = [_]
 chr(096) = [`]
 chr(123) = [{]
 chr(124) = [|]
 chr(125) = [}]
 chr(126) = [~]
 chr(173) = [¡]
 chr(168) = [¿]
 chr(155) = [¢]
 chr(156) = [£]
 chr(157) = [¥]
 chr(158) = [₧]
 chr(043) = [+]
 chr(249) = [∙]
 chr(060) = [<]
 chr(061) = [=]
 chr(062) = [>]
 chr(241) = [±]
 chr(174) = [«]
 chr(175) = [»]
 chr(246) = [÷]
 chr(251) = [√]
 chr(239) = [∩]
 chr(247) = [≈]
 chr(240) = [≡]
 chr(243) = [≤]
 chr(242) = [≥]
 chr(169) = [⌐]
 chr(244) = [⌠]
 chr(245) = [⌡]
 chr(254) = [■]
 chr(196) = [─]
 chr(205) = [═]
 chr(179) = [│]
 chr(186) = [║]
 chr(218) = [┌]
 chr(213) = [╒]
 chr(214) = [╓]
 chr(201) = [╔]
 chr(191) = [┐]
 chr(184) = [╕]
 chr(183) = [╖]
 chr(187) = [╗]
 chr(192) = [└]
 chr(212) = [╘]
 chr(211) = [╙]
 chr(200) = [╚]
 chr(217) = [┘]
 chr(190) = [╛]
 chr(189) = [╜]
 chr(188) = [╝]
 chr(195) = [├]
 chr(198) = [╞]
 chr(199) = [╟]
 chr(204) = [╠]
 chr(180) = [┤]
 chr(181) = [╡]
 chr(182) = [╢]
 chr(185) = [╣]
 chr(194) = [┬]
 chr(209) = [╤]
 chr(210) = [╥]
 chr(203) = [╦]
 chr(193) = [┴]
 chr(207) = [╧]
 chr(208) = [╨]
 chr(202) = [╩]
 chr(197) = [┼]
 chr(216) = [╪]
 chr(215) = [╫]
 chr(206) = [╬]
 chr(223) = [▀]
 chr(220) = [▄]
 chr(221) = [▌]
 chr(222) = [▐]
 chr(219) = [█]
 chr(176) = [░]
 chr(177) = [▒]
 chr(178) = [▓]
 chr(170) = [¬]
 chr(248) = [°]
 chr(230) = [µ]
 chr(250) = [·]
 chr(048) = [0]
 chr(172) = [¼]
 chr(171) = [½]
 chr(049) = [1]
 chr(050) = [2]
 chr(253) = [²]
 chr(051) = [3]
 chr(052) = [4]
 chr(053) = [5]
 chr(054) = [6]
 chr(055) = [7]
 chr(056) = [8]
 chr(057) = [9]
 chr(236) = [∞]
 chr(097) = [a]
 chr(065) = [A]
 chr(166) = [ª]
 chr(160) = [á]
 chr(133) = [à]
 chr(131) = [â]
 chr(132) = [ä]
 chr(142) = [Ä]
 chr(134) = [å]
 chr(143) = [Å]
 chr(145) = [æ]
 chr(146) = [Æ]
 chr(098) = [b]
 chr(066) = [B]
 chr(099) = [c]
 chr(067) = [C]
 chr(135) = [ç]
 chr(128) = [Ç]
 chr(100) = [d]
 chr(068) = [D]
 chr(101) = [e]
 chr(069) = [E]
 chr(130) = [é]
 chr(144) = [É]
 chr(138) = [è]
 chr(136) = [ê]
 chr(137) = [ë]
 chr(102) = [f]
 chr(070) = [F]
 chr(159) = [ƒ]
 chr(103) = [g]
 chr(071) = [G]
 chr(104) = [h]
 chr(072) = [H]
 chr(105) = [i]
 chr(073) = [I]
 chr(161) = [í]
 chr(141) = [ì]
 chr(140) = [î]
 chr(139) = [ï]
 chr(106) = [j]
 chr(074) = [J]
 chr(107) = [k]
 chr(075) = [K]
 chr(108) = [l]
 chr(076) = [L]
 chr(109) = [m]
 chr(077) = [M]
 chr(110) = [n]
 chr(252) = [ⁿ]
 chr(078) = [N]
 chr(164) = [ñ]
 chr(165) = [Ñ]
 chr(111) = [o]
 chr(079) = [O]
 chr(167) = [º]
 chr(162) = [ó]
 chr(149) = [ò]
 chr(147) = [ô]
 chr(148) = [ö]
 chr(153) = [Ö]
 chr(112) = [p]
 chr(080) = [P]
 chr(113) = [q]
 chr(081) = [Q]
 chr(114) = [r]
 chr(082) = [R]
 chr(115) = [s]
 chr(083) = [S]
 chr(225) = [ß]
 chr(116) = [t]
 chr(084) = [T]
 chr(117) = [u]
 chr(085) = [U]
 chr(163) = [ú]
 chr(151) = [ù]
 chr(150) = [û]
 chr(129) = [ü]
 chr(154) = [Ü]
 chr(118) = [v]
 chr(086) = [V]
 chr(119) = [w]
 chr(087) = [W]
 chr(120) = [x]
 chr(088) = [X]
 chr(121) = [y]
 chr(089) = [Y]
 chr(152) = [ÿ]
 chr(122) = [z]
 chr(090) = [Z]
 chr(224) = [α]
 chr(226) = [Γ]
 chr(235) = [δ]
 chr(238) = [ε]
 chr(233) = [Θ]
 chr(227) = [π]
 chr(229) = [σ]
 chr(228) = [Σ]
 chr(231) = [τ]
 chr(237) = [φ]
 chr(232) = [Φ]
 chr(234) = [Ω]


Dave Benham

einstein1969
Expert
Posts: 960
Joined: 15 Jun 2012 13:16
Location: Italy, Rome

Re: Char/String Compare

#6 Post by einstein1969 » 03 Oct 2013 11:51

@penpen

Good diagram! Unfortunately I can not understand the logic that there is' behind the "dos IF" :twisted:

@dbenham

Beautiful work. Aside from the character corresponding to the ASCII code 0 there seems to be everything. At this point, with a small effort it would be possible to write in binary in an environment variable (for example, to compress) or write a file in binary. Am I right?

Einstein1969

EDIT: I have edit the my batch adding the possibility of change the first character. On seven the space not work in set /P?

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Char/String Compare

#7 Post by dbenham » 03 Oct 2013 15:10

einstein1969 wrote:Aside from the character corresponding to the ASCII code 0 there seems to be everything. At this point, with a small effort it would be possible to write in binary in an environment variable (for example, to compress) or write a file in binary. Am I right?

Absolutely.

Unfortunately, I don't think anyone has come up with a way to use native batch commands to write code 0x00. I believe it may well be impossible.

=================================================================

My prior post results were from a Windows 7 64 bit machine. I decided to run the same test on a virtual XP machine, and I got similar, but different results :!:


Dave Benham

penpen
Expert
Posts: 2009
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Char/String Compare

#8 Post by penpen » 03 Oct 2013 15:54

dbenham wrote:Unfortunately, I don't think anyone has come up with a way to use native batch commands to write code 0x00. I believe it may well be impossible.

I don't know if you accept this as a pure batch solution as it bases on an existing file containing a nullbyte, but you could do something like this:
You may use set/P >, and >> to add non 0x00 characters to a file a.bin, and whenever you need a 0x00 character that is stored in file 0x00.bin you could do this:

Code: Select all

copy /B a.bin + /B 0x00.bin a.bin
This way you can write all characters to another file.

penpen

penpen
Expert
Posts: 2009
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Char/String Compare

#9 Post by penpen » 04 Oct 2013 11:37

einstein1969 wrote:Unfortunately I can not understand the logic that there is' behind the "dos IF" :twisted:
I'm not sure what you are meaning.
If you want to know why the characters are put in this order wee see here, than i only have to speculate, as i don't know it.

One reason my be, that MS has created an order for all glyphes that may be used by the dos shell with no dependencies to any other resource, so there is no inner logic behind that order (i doubt that, but it could be).

Another reason might be hidden in the character representation.
A character is just a semantic definition, for example: "undersore".
To simplify their usage, to each character a codepoint is assigned uniquely, for example: codepoint ("underscore") = 0x5F.
As this is a bijective mapping, you also could say it the other way around: character (0x5F) = "underscore".
An set of such assignments is called a character code/character set/character mapping/..., for example: ASCII.

Example:
ASCII: {0x00 : 0x7F} --> { ..., "underscore", ... }
..., ASCII (0x5F) := "underscore", ...

But with this definitions you can't see anything. You need some graphics called glyphs, that you may use, if you want to display a character.
A set of glyphs is a font. They are stored in a font file, but as a font is not forced to define all glyphs for all characters they have an index number and a codepoint number, for example glyph_and_codepoint_of_font[0] := (graphic of (_), 0x0F).
Such a mapping is called a codepage. But this name is also in use for:
- a mapping from each codepoint to a glyph index, and
- a mapping from each codepoint to a glyph.
This has historical reasons, as the glyphs were stored in RAM or in BIOS ROM/RAM basing on the system, and just was replaced if the codepage was changed.
As Windows started using Unicode for the character representation, they stepwise changed this definition, and i assume they also changed the implementation of the chcp executable.

So nowadays, everytime a DOS shell reads byte value for character representation, it does something like this:
- interprets this byte as an ANSI codepoint
- parses this ANSI codepoint to a Unicode codepoint
- performs the Unicode codepoint using the actual codepage so it gets a fonts glyph index
- finally it renders the glyph at this glyph index

Back to the character order that IF is using:
The shell may use any number to define the used order (ANSI codepoint, Unicode codepoint, glyph index).
It also could use the indices of the transformations itself, for example:
The index of the assignment [ASCII (0x5F) := "underscore"] that could be any number in [0x00 : 0x7F], and may not be identical with the codepoint, similar to glyph indices.
It also could be any transformed of the above, for example:
Lets assume, that the assignment of [ASCII (0x5F) := "underscore"] is 0x04.
Then the ordering key may be the value 0x5F04.

I myself assume that the dos shell uses the indices of the glyphs to order the characters, but i can't proof it.
But a hint is that on different machines, where the fonts may be rebuild for performance reasons; so other indices are used to order the characters.
But this hint may also be true for many other character orders.

penpen

einstein1969
Expert
Posts: 960
Joined: 15 Jun 2012 13:16
Location: Italy, Rome

Re: Char/String Compare

#10 Post by einstein1969 » 04 Oct 2013 15:00

Thanks penpen for your explain.

It 'an excellent starting point to investigate further if it were to serve. The map orderly by dbenham (although it is more superficial as understanding) satisfies the need to make the IF function. That is, you can use it to create a predictable algorithm. Your graph allows us to further deepen understanding.

Thanks a lot for your contribution.

Einstein1969

Post Reply