Page 1 of 2

toLower Name dependent?

Posted: 06 May 2011 16:19
by WernerGg
Look at this little program with the standard toLower function from the library:

Code: Select all

@echo off
rem Test toLower

set str=Hugo
echo.
echo str=%str%
call :toLower str
echo str=%str%

set alt2=Hugo
echo.
echo alt2=%alt2%
call :toLower alt2
echo alt2=%alt2%

set alt=Hugo
echo.
echo alt=%alt%
call :toLower alt
echo alt=%alt%

Exit /b

:toLower str -- converts uppercase character to lowercase
::           -- str [in,out] - valref of string variable to be converted
:$created 20060101 :$changed 20080219 :$categories StringManipulation
:$source http://www.dostips.com
if not defined %~1 EXIT /b
for %%a in ("A=a" "B=b" "C=c" "D=d" "E=e" "F=f" "G=g" "H=h" "I=i"
            "J=j" "K=k" "L=l" "M=m" "N=n" "O=o" "P=p" "Q=q" "R=r"
            "S=s" "T=t" "U=u" "V=v" "W=w" "X=x" "Y=y" "Z=z" "Ä=ä"
            "Ö=ö" "Ü=ü") do (
    call set %~1=%%%~1:%%~a%%
)
EXIT /b


It produces:

Code: Select all

str=Hugo
str=hugo

alt2=Hugo
alt2="Ü=ü"lt2:Ü=ü

alt=Hugo
alt="Ü=ü"lt:Ü=ü

Strange. The results seems to be dependent of the strings name.

Re: toLower Name dependent?

Posted: 06 May 2011 20:46
by !k
add setlocal/endlocal

Code: Select all

setlocal
if not defined %~1 EXIT /b
for %%a in ("A=a" "B=b" "C=c" "D=d" "E=e" "F=f" "G=g" "H=h" "I=i"
            "J=j" "K=k" "L=l" "M=m" "N=n" "O=o" "P=p" "Q=q" "R=r"
            "S=s" "T=t" "U=u" "V=v" "W=w" "X=x" "Y=y" "Z=z" "Ä=ä"
            "Ö=ö" "Ü=ü") do (
    call set %~1=%%%~1:%%~a%%
)
endlocal
EXIT /b

Re: toLower Name dependent?

Posted: 07 May 2011 04:13
by WernerGg
Sorry. I'm afraid to be in total confusion.
The setlocal/endlocal bracket will and cannot work. It changes the string within the local toLower-environment only and not within the callers context.

I think it must be:

Code: Select all

:: Correct version
:toLower str -- converts uppercase character to lowercase
::           -- str [in,out] - valref of string variable to be converted
:$created 20060101 :$changed 20080219 :$categories StringManipulation
:$source http://www.dostips.com
setlocal enabledelayedexpansion
if not defined %~1 EXIT /b
set str=%~1
call set "str=%%%str%%%"
for %%a in ("A=a" "B=b" "C=c" "D=d" "E=e" "F=f" "G=g" "H=h" "I=i"
            "J=j" "K=k" "L=l" "M=m" "N=n" "O=o" "P=p" "Q=q" "R=r"
            "S=s" "T=t" "U=u" "V=v" "W=w" "X=x" "Y=y" "Z=z" "Ä=ä"
            "Ö=ö" "Ü=ü") do (
    set str=!str:%%~a!
)
(endlocal
   set %~1=%str%
   exit /b
)


BTW: This toLower algorithm is very expensive. Basically there is a double loop. An outer loop over all 29 alphabetic characters. For each of this 29 steps we get one inner loop over all characters of the hole string during !str:%%~a!. And then an assignement during set str=!str:%%~a!. Most of that efforts change nothing but are necessary.

Nobody would program like that in a standard programming language. One would have just one loop over the characters of the string and use the ascci-value of each character as an index into a 256-sized toLower-translation array.

The latter might be 50 or even 100 times faster. Is there no way to get such an algorithm into work with batch programming?

At least with %str:~%%i,1% we can get each character of the string. But how to iterate over the string (without calling :strLen beforehand) and how to simulate array indexing efficiently?

Re: toLower Name dependent? - proposed bug/limitation fix

Posted: 09 May 2011 22:24
by dbenham
Good catch WernerGg - you have discovered a limitation/bug in the existing :toLower code and come up with a working fix.

Given that your modified code is already using SETLOCAL ENABLEDELAYEDEXPANSION, the following two lines:

Code: Select all

set str=%~1
call set "str=%%%str%%%"

can be replaced by the much simpler:

Code: Select all

set "str=!%~1!"


Also your final assignment in the ENDLOCAL block should be enclosed in quotes in case of special characters:

Code: Select all

    set "%~1=%str%"

There still can be problems if the string contains both quotes and special characters, but the enclosing quotes are a wothwhile improvement.


The original :toLower fails with any variable starting with lower case <a> because the leading <a> is interpreted as the %%a of the for loop. You can see what I mean by setting ECHO ON before the for loop.

The same problem exists for the existing :toUpper and :toCamelCase functions. I think all three functions should be updated with variants of your fix.


WernerGg wrote:BTW: This toLower algorithm is very expensive. Basically there is a double loop. An outer loop over all 29 alphabetic characters. For each of this 29 steps we get one inner loop over all characters of the hole string during !str:%%~a!. And then an assignement during set str=!str:%%~a!. Most of that efforts change nothing but are necessary.

Nobody would program like that in a standard programming language. One would have just one loop over the characters of the string and use the ascci-value of each character as an index into a 256-sized toLower-translation array.

Ahh, but DOS is not a normal programming language! If you compare how long it takes to process a short string vs. a long string, there is barely a difference. That is because the loop iteration count is fixed at 29 and the string substitution uses compiled, optimized code that is comparatively much faster than the interpreted batch language it is embedded within. Switching to an iteration over the characters within the string will actually slow the function down, especially as the string grows, even though logically it should be more efficient. The situation is made worse because there are no built in functions to determine the length of a string or the ascii value of a character.

Dave Benham

Re: toLower Name dependent?

Posted: 10 May 2011 00:53
by WernerGg
Thanks Dave for clarifying what's happening within the original toLower-code. That's funny with the a-variablenames.

Of course I take your enhancements for the code. Thanks.

I do not agree with you in all points about the effort analysis of the code. It is true that for longer strings the disadvantage of the replace mechanism gets lower. But for short strings its diadvantage is a factor of 29+. And who translates long strings to lower? The typical application is for sort, search, lookup. And this is done with keys, i.e short (some 1-10 characters) strings.

Anyway. The batch language has not the means for a straight-forward algorithm. So the one we have is about the best.

Maybe one could try to break the for-loop when there are no more uppercase characters and arange the substitutions according to their frequency distribution (vocals first). But probably it is not worth the pain because detecting "no more uppercase" is expensive as well.

Re: toLower Name dependent?

Posted: 16 May 2011 13:59
by Ed Dyreen
first of all sory for my english,

I probably don't understand the problem very weel &am missing the point, probably,
but if i want to do a ToLower conversion i use something like:

Code: Select all

@echo off &SetLocal EnableExtensions EnableDelayedExpansion

set "LCaseString=a b c d e f g h i j k l m n o p q r s t u v w x y z"

set $=HELLO

for %%! in ( !LCaseString! ) do set "$=!$:%%!=%%!!"

echo.$=!$!_
pause
exit

taking advantage of the fact that for is case insensitive
the output should be:
hello

i could not test for the Ü to ü symbol, probably because i use a dutch OS :shock:
i can't even echo that symbol, it just display a square ?!

Re: toLower Name dependent?

Posted: 17 May 2011 03:57
by WernerGg
@Ed
I think this is basically the same algorithm as above. It is written down a bit shorter and more difficult to understand. But it does exactly the same. (Maybe even a little bit more since it needlessly replaces lowerase characters as well).

Of course one could easily add the German umlaut and that would work.

Re: toLower Name dependent?

Posted: 17 May 2011 06:51
by Ed Dyreen
@WernerGg
I learnt that the for command is case insensitive, you say it needlesly replaces lowercase characters, which is true yes but, if my theory is correct that for is case insensitive, it will also needlesly replaces highercase characters if they are in fact lowercase. so :

Code: Select all

set "LCaseString=a b c d e f g h i j k l m n o p q r s t u v w x y z"
set $=HELLO_needlesly
for %%! in ( !LCaseString! ) do set "$=!$:%%!=%%!!"

needlesly replaces the word 'needlesly' but :

Code: Select all

set $=HELLO_needlesly
for %%a in ("A=a" "B=b" "C=c" "D=d" "E=e" "F=f" "G=g" "H=h" "I=i"
            "J=j" "K=k" "L=l" "M=m" "N=n" "O=o" "P=p" "Q=q" "R=r"
            "S=s" "T=t" "U=u" "V=v" "W=w" "X=x" "Y=y" "Z=z" "Ä=ä"
            "Ö=ö" "Ü=ü") do (
    set $=!$:%%~a!
)

Also needlesly replaces the word 'needlesly' since for can't tell the difference between 'A' or 'a' etc..

Code: Select all

@echo off &SetLocal EnableExtensions EnableDelayedExpansion

set "$=HELLO"
set "$=!$:O=o!"
echo.$=!$!

set "$=HELLO"
set "$=!$:o=o!"
echo.$=!$!

pause
exit

In both cases the output = 'HELLo' while the second replacement should not occur, it does !

Re: toLower Name dependent?

Posted: 17 May 2011 11:42
by WernerGg
Hmm,
I think all this has nothing to do with the for loop and its case (non)sensitivity as your last example shows. It is a question of the cmd.exe-implementation of

Code: Select all

set y=%x:str1=str2%

Obviously this implementation is wrong and from a performance point of view badly implemented since the program sencelessly converts x and str1 to lowercase before searching for occurences of str1 within x.
Hence I suppose that your implemention with

Code: Select all

set y=%x:o=o%

ends up in exactly the same instructions as

Code: Select all

set y=%x:O=o%

But your program works only because of that confused cmd.exe-programmer (God rest his soul)

Re: toLower Name dependent?

Posted: 17 May 2011 11:48
by Ed Dyreen
You are right, my (dumb) mistake, it is not for but set that is case insensitive :roll:

Re: toLower Name dependent?

Posted: 28 May 2011 07:45
by dbenham
OK, now that we know that :toLower, :toUpper and :toCamelCase fail if the string variable name begins with "a", the dostips.com published functions ought to be updated.

An alternative to WernerGg's fix is to simply change the character that is used in the for loops. The perfect solution would be to use a character that cannot be used to begin a variable name, but that can be used in a FOR loop. I'm not aware of such a character. The = cannot be used in either.

The next best thing is to use a character that is highly unlikely to begin a variable name.

One good candidate is the forward slash /. It can be used directly in the FOR loop, but cannot lead a variable name unless the set expression is enclosed in quotes.

Other good candidates are characters that must be escaped in the for loop but also require special handling in the variable name. Examples include:
" ^ & < > |

Dave Benham

Re: toLower Name dependent?

Posted: 28 May 2011 12:16
by WernerGg
Well Dave, I would never program anything that works most of the time but not always.
I think my "fix" is not a fix but the proper coding. And I see no advantage in a second to best solution.

Re: toLower Name dependent?

Posted: 28 May 2011 16:04
by dbenham
It's only an alternative WernerGg, and I would agree an inferior one. But it is still better than the existing published code. I'm just hoping the owner/caretaker of this site picks up on this thread and improves the published code.

BTW, your code does not work all of the time - try calling with string= "A|A"|A
or call the function while delayed expansion is enabled and the string contains ! and or ^.

It's all a question of how much extra code anyone wants to put in to handle various contingencies.

Dave

Re: toLower Name dependent?

Posted: 05 Jun 2011 20:02
by dbenham
I take it back - changing the FOR loop variable does not help. The function fails if it is called within a loop and the lead character of the variable name matches the loop variable:

Code: Select all

@echo off
set str=HELLO
for %%s in (1,1,1) do call :toLower str
set str
exit /b

:toLower str -- converts uppercase character to lowercase
::           -- str [in,out] - valref of string variable to be converted
:$created 20060101 :$changed 20080219 :$categories StringManipulation
:$source http://www.dostips.com
if not defined %~1 EXIT /b
for %%a in ("A=a" "B=b" "C=c" "D=d" "E=e" "F=f" "G=g" "H=h" "I=i"
            "J=j" "K=k" "L=l" "M=m" "N=n" "O=o" "P=p" "Q=q" "R=r"
            "S=s" "T=t" "U=u" "V=v" "W=w" "X=x" "Y=y" "Z=z" "Ä=ä"
            "Ö=ö" "Ü=ü") do (
    call set %~1=%%%~1:%%~a%%
)
EXIT /b

Output:

Code: Select all

str=1tr:▄=ⁿ


Also, WernerGg's alternative is much faster than the original code.

Dave Benham

Re: toLower Name dependent?

Posted: 06 Jun 2011 05:22
by jeb
Hi,

the main problem in the "old" implementation is the

Code: Select all

call set %~1=%%%~1:%%~a%%

It is very slow and doesn't work with complex string content, as it uses the percent expansion.
So it should always changed to

Code: Select all

set "%~1=!%~1:%%~a!"

However, it shows a new effect.

Code: Select all

@echo off
setlocal EnableDelayedExpansion
for %%s in ("s-content") do (
   echo loop1 loopvar-s=%%s
   call :test
)
exit /b

:test
echo test1 loopvar-s=%%s
for %%a in ("a-content") do (
   echo loop2 loopvar-a=%%a
   echo loop2 loopvar-s=%%s
)
exit /b


Now you can see, even in a sub function you could access for-loop vars from outside,
if you start your own local for-loop.

jeb