strLen boosted

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
jeb
Expert
Posts: 1055
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

Re: strLen boosted

#16 Post by jeb » 14 Jan 2011 06:17

Nice,

but I see one thing to optimize.
The appending of the "gauging"-helper can be done without a FOR-LOOP.
Not very much, but if you do some million tests ...

Code: Select all

:strLen string len -- returns the length of a string
(   SETLOCAL ENABLEDELAYEDEXPANSION
    set "str=A!%~1!"&rem keep the A up front to ensure we get the length and not the upper bound
                     rem it also avoids trouble in case of empty string
    set "len=0"
    for /L %%A in (12,-1,8) do (
        set /a "len|=1<<%%A"
        for %%B in (!len!) do if "!str:~%%B,1!"=="" set /a "len&=~1<<%%A"
    )
)
REM ##### HERE IS THE DIFFERENCE
(
    set str=!str:~%len%,-1!^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FFFFFFFFFFFFFFFFEEEEEEEEEEEEEEEEDDDDDDDDDDDDDDDDCCCCCCCCCCCCCCCC^
BBBBBBBBBBBBBBBBAAAAAAAAAAAAAAAA99999999999999998888888888888888^
7777777777777777666666666666666655555555555555554444444444444444^
3333333333333333222222222222222211111111111111110000000000000000
    set /a len+=0x!str:~0x1FF,1!!str:~0xFF,1!
)
( ENDLOCAL & REM RETURN VALUES
    IF "%~2" NEQ "" SET /a %~2=%len%
)
EXIT /b

sowgtsoi
Posts: 8
Joined: 23 Oct 2010 10:14

Re: strLen boosted

#17 Post by sowgtsoi » 16 Jan 2011 17:03

Ooops, apparently the word 'gauge' is a false friend ; it appears that I should have used the word 'dipstick' to convey what I had in mind ; I've corrected it in the preceding posts.
(By the way, I know of no preexisting name for the technique you employed and it reminds me of when one has to gauge the oil level of a car by dipping a fixed length stick in the liquid and directly assessing the result, hence this choice when I had to name it).



Well .. we have a case here were differing amounts of comments lead to opposite conclusions when the speed is tested :
- your code -- as-is (minus the separating comment) -- is faster in all cases (at a good 95% of the previous time),
- your code -- prepared for a possible inclusion (changed date, site URL) -- is slower in all cases (at a small 105%).

The second case is more rigorous (same amounts of comments in the functions compared) but that it's not faster is indeed unintuitive considering the results of this test :

test_fornofor.cmd

Code: Select all

@echo off
setlocal enabledelayedexpansion
set a=0
echo:Testing ... (takes a few seconds)

call :getTod tcs_1
for /L %%i in (1,1,100000) do ( for %%j in (!a!) do set a=%%j )
call :getTod tcs_2
for /L %%i in (1,1,100000) do ( set a=%a% )
call :getTod tcs_3

set /a   "tcs_for=tcs_2-tcs_1" & if   !tcs_for! LSS 0 set /a   tcs_for+=8640000
set /a "tcs_nofor=tcs_3-tcs_2" & if !tcs_nofor! LSS 0 set /a tcs_nofor+=8640000

echo:   With a 'for' : %tcs_for%cs.
echo:Without a 'for' : %tcs_nofor%cs. (should be faster)

endlocal
pause
@echo on
@exit /b


:: less locale dependent version of getTod
:: adapted from http://www.dostips.com/DtCodeFunctions.php#_Toc128586395
::
:getTod -- get a Time of Day value in 1/100th seconds
::   -- %~1: out - time of day
SETLOCAL
set t=%time: =0%
set /a t=((1%t:~0,2%*60+1%t:~3,2%)*60+1%t:~6,2%)*100+1%t:~9,2%-36610100
( ENDLOCAL & REM RETURN VALUES
    IF "%~1" NEQ "" SET %~1=%t%
)
GOTO:EOF

jeb
Expert
Posts: 1055
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

Re: strLen boosted

#18 Post by jeb » 26 Jan 2011 16:56

Ok, +-5% is not really important.

Therefore I build a much better variant, with only one FOR-LOOP :wink:
And in your test suite it is always faster.
To better compare the variants, I removed the comment blocks,
and move the EXIT /b always into the last block

strLenN_binarySplit.bat

Code: Select all

:strLen string len -- returns the length of a string via binary search
(   
   setlocal EnableDelayedExpansion
    set "s=!%~1!#"
    set "len=0"
    for %%P in (4096 2048 1024 512 256 128 64 32 16 8 4 2 1) do (
      if "!s:~%%P,1!" NEQ "" (
         set /a "len+=%%P"
         set "s=!s:~%%P!"
      )
   )
)
(
   endlocal
    set "%~2=%len%"
   exit /b
)
::End of function


It tests like the normal binary test, but in the positive case the string is reduced by the new value.

btw. (with or without the ,1)

Code: Select all

if "!str:~%%P,1!" NEQ ""
if "!str:~%%P!" NEQ ""
seems to be nearly of the same speed, even if str is long. :o

hope it helps
jeb

sowgtsoi
Posts: 8
Joined: 23 Oct 2010 10:14

Re: strLen boosted

#19 Post by sowgtsoi » 23 Feb 2011 16:52

(tests' sources in the next post)

(Using «,1» alleviates the need to worry about the command line limit. One less thing to think about ? No impact on speed !? I like !)


Wow, astute ! 'strLenN_binarySplit'(.txt) manages to complete in a good 90% of the time taken by the yet already refined 'strLenL_dipstick' !

I've retrofitted 'strLenL_dipstick' with your enhancements to produce 'strLenO_dipstick_powers' which in turn completes in :
- a small 95% of the time taken by 'strLenN_binarySplit',
- a good 85% of the time taken by 'strLenL_dipstick',
- a small 75% of the time taken by 'strLenJ_20101116', the current version of «:strLen».

All these improvements lead me to propose this function for inclusion :

Code: Select all

:strLen string len -- returns the length of a string
::                 -- string [in]  - variable name containing the string being measured for length
::                 -- len    [out] - variable to be used to return the string length
:: Many thanks to 'sowgtsoi', but also 'jeb' and 'amel27' dostips forum users helped making this short and efficient
:$created 20081122 :$changed 20110220 :$categories StringOperation
:$source http://www.dostips.com
( SETLOCAL ENABLEDELAYEDEXPANSION
    set "str=A!%~1!"&rem keep the A up front to ensure we get the length and not the upper bound
                     rem it also avoids trouble in case of empty string
    set "len=0"
    for %%P in (4096 2048 1024 512 256) do (
        if "!str:~%%P,1!" NEQ "" (
            set /a "len+=%%P"
            set "str=!str:~%%P!"
        )
    )
    set str=!str:~1!^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FFFFFFFFFFFFFFFFEEEEEEEEEEEEEEEEDDDDDDDDDDDDDDDDCCCCCCCCCCCCCCCC^
BBBBBBBBBBBBBBBBAAAAAAAAAAAAAAAA99999999999999998888888888888888^
7777777777777777666666666666666655555555555555554444444444444444^
3333333333333333222222222222222211111111111111110000000000000000
    set /a "len+=0x!str:~0x1FF,1!!str:~0xFF,1!"
)
( ENDLOCAL & REM RETURN VALUES
    IF "%~2" NEQ "" SET /a %~2=%len%
    EXIT /b
)
:mrgreen: .. the which function is now three times faster than the original 'strLenA_loopUnrolled' !


Thanks for your insights !


Edits : grammar.
Last edited by sowgtsoi on 23 Feb 2011 17:26, edited 2 times in total.

sowgtsoi
Posts: 8
Joined: 23 Oct 2010 10:14

Re: strLen boosted

#20 Post by sowgtsoi » 23 Feb 2011 16:56

To test on an XP system or higher, copy the following files in the dedicated folder described in the second post of this thread ("strLen_tests_stub.txt", "strLenA_loopUnrolled.txt", "strLenJ_20101116.txt" and "strLenL_dipstick.txt" should already be present too).

Notes :
| I've realized that using the variable name "s" was causing a problem here.
| In "strLen_tests_stub.txt", «s» also appears in the body of the testing function that calls «:strLen».
| As far as I can tell, the interpreter is not confused by variable shadowing ; things proceed correctly.
| What escapes me is that the homonymy results in a slight speed boost.
| The problem is that it will have no particular reason to happen in the wild.
| (*cough* 'strLenI_chunks' *cough*).
| This slight boost is significant in that it's of the same magnitude as the improvements made at this stage of «:strLen»'s streamlining.
| In some sense it's good news : this sensitivity clues that «:strLen» is now only a tad removed from perfection !
|
| For the sake of accuracy, the last functions are tested "all things otherwise equal" : same amounts of comments (none), same variables where relevant, etc.



strLen_even_more_tests.cmd

Code: Select all

@echo off
:: 2011-01-08
SETLOCAL ENABLEDELAYEDEXPANSION

:: always working in the directory of this script :
%~d0 & cd "%~dp0"

:: setup :
if not exist logs\ mkdir logs
if exist tmp\ rmdir /S /Q tmp
mkdir tmp
for %%v in (
    A_loopUnrolled
    J_20101116
    L_dipstick
    N_binarySplit
    O_dipstick_powers
    ) do (
    type "strLen_tests_stub.txt" "strLen%%v.txt" > "tmp\test_strLen%%v.cmd"
) 1>nul 2>nul

:: syntactic sugar :
set      "test= call ^"tmp\test_strLen^^!version^^!.cmd^" "
set   "do_test= "
set "skip_test= if 0==1 "


%do_test% (echo:&echo:&ver&echo:Unit test :
    set "version=A_loopUnrolled"   &%test% unittest
    set "version=J_20101116"       &%test% unittest
    set "version=L_dipstick"       &%test% unittest
    set "version=N_binarySplit"    &%test% unittest
    set "version=O_dipstick_powers"&%test% unittest
) >con
::>"logs\strLen_unittest.log"
::>con


%skip_test% (echo:&echo:&ver&echo:Points of interest :
    set "version=A_loopUnrolled"   &%test% correctness    0_start 8_span
    set "version=J_20101116"       &%test% correctness    0_start 8_span
                                    %test% correctness 1020_start 8_span
                                    %test% correctness 7164_start 8_span
    set "version=L_dipstick"       &%test% correctness    0_start 8_span
                                    %test% correctness 1020_start 8_span
                                    %test% correctness 7164_start 8_span
    set "version=N_binarySplit"    &%test% correctness    0_start 8_span
                                    %test% correctness 1020_start 8_span
                                    %test% correctness 7164_start 8_span
    set "version=O_dipstick_powers"&%test% correctness    0_start 8_span
                                    %test% correctness 1020_start 8_span
                                    %test% correctness 7164_start 8_span
) >con
::>"logs\strLen_correctness_partial.log"
::>con


%do_test% (echo:&echo:&ver&echo:Limits :
    set "version=A_loopUnrolled"   &%test% correctness 1020_start 8_span
    set "version=J_20101116"       &%test% correctness 8183_start 8_span
    set "version=L_dipstick"       &%test% correctness 8183_start 8_span
    set "version=N_binarySplit"    &%test% correctness 8183_start 8_span
    set "version=O_dipstick_powers"&%test% correctness 8183_start 8_span
) >con
::>"logs\strLen_correctness_limits.log"
::>con


:: with all versions, takes 3 good minutes at 1.7 GHz :
%skip_test% (echo:&echo:&ver&echo:Comprehensive testing :
    set "version=A_loopUnrolled"   &%test% correctness 0_start 1030_span
    set "version=J_20101116"       &%test% correctness 0_start 8200_span
    set "version=L_dipstick"       &%test% correctness 0_start 8200_span
    set "version=N_binarySplit"    &%test% correctness 0_start 8200_span
    set "version=O_dipstick_powers"&%test% correctness 0_start 8200_span
) >"logs\strLen_correctness_full_test.log"
::>"logs\strLen_correctness_full_test.log"
::>con


:: with all versions, takes 6 good minutes at 1.7 GHz :
%skip_test% (echo:&echo:&echo:&ver&echo:Speed comparisons :>con
    echo:&set "version=A_loopUnrolled"   &%test% speed    0_start  250_span 8_times
    rem "warm up" done.
                                          %test% speed    0_start  250_span 8_times
                                          %test% speed  250_start  250_span 8_times
                                          %test% speed  500_start  250_span 8_times
                                          %test% speed  750_start  250_span 8_times
    echo:&set "version=J_20101116"       &%test% speed    0_start  250_span 8_times
                                          %test% speed  250_start  250_span 8_times
                                          %test% speed  500_start  250_span 8_times
                                          %test% speed  750_start  250_span 8_times
                                          %test% speed 1000_start 1000_span 2_times
                                          %test% speed 2000_start 1000_span 2_times
                                          %test% speed 7000_start 1000_span 2_times
    echo:&set "version=L_dipstick"       &%test% speed    0_start  250_span 8_times
                                          %test% speed  250_start  250_span 8_times
                                          %test% speed  500_start  250_span 8_times
                                          %test% speed  750_start  250_span 8_times
                                          %test% speed 1000_start 1000_span 2_times
                                          %test% speed 2000_start 1000_span 2_times
                                          %test% speed 7000_start 1000_span 2_times
    echo:&set "version=N_binarySplit"    &%test% speed    0_start  250_span 8_times
                                          %test% speed  250_start  250_span 8_times
                                          %test% speed  500_start  250_span 8_times
                                          %test% speed  750_start  250_span 8_times
                                          %test% speed 1000_start 1000_span 2_times
                                          %test% speed 2000_start 1000_span 2_times
                                          %test% speed 7000_start 1000_span 2_times
    echo:&set "version=O_dipstick_powers"&%test% speed    0_start  250_span 8_times
                                          %test% speed  250_start  250_span 8_times
                                          %test% speed  500_start  250_span 8_times
                                          %test% speed  750_start  250_span 8_times
                                          %test% speed 1000_start 1000_span 2_times
                                          %test% speed 2000_start 1000_span 2_times
                                          %test% speed 7000_start 1000_span 2_times
    rem to check consistency, the first test is repeated :
    echo:&set "version=A_loopUnrolled"   &%test% speed    0_start  250_span 8_times
) >>"logs\strLen_speed_comparisons.log"
::>>"logs\strLen_speed_comparisons.log"
::>con


:: cleaning :
if exist tmp\ rmdir /S /Q tmp
echo:
echo:
echo:strLen tests : completed.
pause
ENDLOCAL
@echo on
@EXIT /b



strLenN_binarySplit.txt

Code: Select all

:strLen
( SETLOCAL ENABLEDELAYEDEXPANSION
    set "str=!%~1!#"
    set "len=0"
    for %%P in (4096 2048 1024 512 256 128 64 32 16 8 4 2 1) do (
        if "!str:~%%P,1!" NEQ "" (
            set /a "len+=%%P"
            set "str=!str:~%%P!"
        )
    )
)
( ENDLOCAL
    IF "%~2" NEQ "" SET /a %~2=%len%
    EXIT /b
)



strLenO_dipstick_powers.txt

Code: Select all

:strLen
( SETLOCAL ENABLEDELAYEDEXPANSION
    set "str=A!%~1!"
    set "len=0"
    for %%P in (4096 2048 1024 512 256) do (
        if "!str:~%%P,1!" NEQ "" (
            set /a "len+=%%P"
            set "str=!str:~%%P!"
        )
    )
    set str=!str:~1!^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FFFFFFFFFFFFFFFFEEEEEEEEEEEEEEEEDDDDDDDDDDDDDDDDCCCCCCCCCCCCCCCC^
BBBBBBBBBBBBBBBBAAAAAAAAAAAAAAAA99999999999999998888888888888888^
7777777777777777666666666666666655555555555555554444444444444444^
3333333333333333222222222222222211111111111111110000000000000000
    set /a "len+=0x!str:~0x1FF,1!!str:~0xFF,1!"
)
( ENDLOCAL
    IF "%~2" NEQ "" SET /a %~2=%len%
    EXIT /b
)



And that's it !

plp626
Posts: 5
Joined: 17 Apr 2009 00:36
Location: China

Re: strLen boosted

#21 Post by plp626 » 05 Apr 2011 12:35

very nice!

the list
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
........
3333333333333333222222222222222211111111111111110000000000000000

it's a bit long. i replace it by FEDCBA9876543210 and get this code, it's shorter and efficient.


Code: Select all

:strlen
setlocal enabledelayedexpansion
set "$=!%~1!#"
set N=&for %%a in (4096 2048 1024 512 256 128 64 32 16)do if !$:~%%a^,1!. NEQ . set/aN+=%%a&set $=!$:~%%a!
set $=!$!fedcba9876543210&set/aN+=0x!$:~16,1!

endlocal&If %2. neq . (set/a%2=%N%)else echo %N%

Queue
Posts: 31
Joined: 16 Feb 2013 14:31

Re: strLen boosted

#22 Post by Queue » 20 Feb 2013 03:19

After searching for strlen threads, this one seems the most appropriate place to put this. Building on all previous work in this thread (so very little here is my work), here's my take on strlen:

Code: Select all

:strlen
(   setlocal enabledelayedexpansion & set /a "}=0"
    if defined %~1 (
        for %%# in (4096 2048 1024 512 256 128 64 32 16) do (
            if "!%~1:~%%#,1!" neq "" set "%~1=!%~1:~%%#!" & set /a "}+=%%#"
        )
        set "%~1=!%~1!10000000000000000FEDCBA987654321" & set /a "}+=0x!%~1:~16,1!!%~1:~32,1!"
    )
)
endlocal & if "%~2" neq "" set /a "%~2=%}%" & exit /b

I apologize for it still being a little scrunched up but it should be somewhat readable.

Via tests on my computer, this came out as 1% to 17% (for short to long strings) faster than strLenO_dipstick_powers which was the previous fastest when I tested everything in this thread. For short strings, the difference may fall to random variation and timing inaccuracy, but there were nice savings in speed with massive strings.

Since we're using setlocal anyway, I don't copy the string to a new env var and instead just use the setlocal copy directly which saves some time; it's as much as ~17% faster for a maximum length string (on my computer). Since I'm not sticking a safety character onto the front or back of the string, the ''dipstick'' has to be slightly bulkier to account for a potential 16 character leftover and I use an if defined to check for a zero length string.

I didn't get speed differences between if not a==b and if a neq b so I left it as neq based on precedent set in this thread.

Almost all of the speed improvement came from the lack of the intial string copy, so this could surely be improved upon; coping with the loss of the safety character is primarily what needs to be worked around to improve it.

Edit - Yes, it's obvious that the if defined will have bad behavior if the call sends an empty first argument. Throw a if "%~1" neq "" before the if defined %~1 for safety if it's a concern. Hm, thinking on it, a pre-setlocal abort if argument 1 is empty or the var isn't defined might be worth it (Edit 3 - It's not worth it; another block of code that has to be parsed separately slows it down too much. Now I see why things turned out the way they did. -_-).

Edit 2 - There are some minor structural flaws in the return that would also choke on bad arguments. Guess this needs more work. Regardless, I think there's merit in not initially copying the string data.

Edit 4 - Ok, maybe this instead:

Code: Select all

:strlen
(   setlocal enabledelayedexpansion & set /a "}=0"
    if "%~1" neq "" if defined %~1 (
        for %%# in (4096 2048 1024 512 256 128 64 32 16) do (
            if "!%~1:~%%#,1!" neq "" set "%~1=!%~1:~%%#!" & set /a "}+=%%#"
        )
        set "%~1=!%~1!0FEDCBA9876543211" & set /a "}+=0x!%~1:~32,1!!%~1:~16,1!"
    )
)
endlocal & set /a "%~2=%}%" & exit /b
Set actually gives us good feedback in the case of an empty %2. It honestly seems like a waste to sanity check it. I'd also like to note this function doesn't spit out a garbage return (often 4100 in previous implementations) on a failed env var creation (due to oversized input env vars) and can list string length up to 8189 (and reports 8189 for any string longer than 8189, at least in the test framework). Oh, and don't pass this function cmdcmdline without first ''stabilizing'' it (set cmdcmdline=%cmdcmdline%). %cmdcmdline% is actually the reason I started looking at a strlen function.

Queue

Liviu
Expert
Posts: 470
Joined: 13 Jan 2012 21:24

Re: strLen boosted

#23 Post by Liviu » 20 Feb 2013 17:12

Queue wrote:Regardless, I think there's merit in not initially copying the string data [...] and can list string length up to 8189 (and reports 8189 for any string longer than 8189, at least in the test framework)

That's good thinking, and the accurate count all the way up is a plus, too.

Since this is a subject that never seems to grow old ;-) here is 2 more cents on it. I believe most algorithms are covered between this thread and Dave's roundup at http://ss64.org/viewtopic.php?pid=6478#p6478. However, there is one less obvious difference between them, worth IMHO noting in the context of raw performance.

A couple of those algorithms are easily amenable to work without temporary variables. This is significant because it means that - when called from code that has enableDelayedExpansion already - the function does not in fact need its own setlocal block. While it's normally assumed (and true) that the penalty of a nested setlocal is minimal, it still is an overhead, and can become measurable in cases of large environments - discussed for example at http://www.dostips.com/forum/viewtopic.php?f=3&t=2597.

When enableDelayedExpansion can be pre-assumed, these variations on the known themes could be competitive.

Code: Select all

:strlen4.edx  StrVar  RtnVar
set /a "%~2=%random%"
echo(!%~1!>"%temp%\strlen!%~2!.tmp"
for %%F in ("%temp%\strlen!%~2!.tmp") do (
  del "%temp%\strlen!%~2!.tmp"
  set /a "%~2 = %%~zF - 2"
)
exit /b

:strlen1.edx  StrVar  RtnVar
@rem use 'if "!%~1" == ""' if StrVar might contain spaces
set /a "%~2 = 0" & if not defined %~1 exit /b
for %%A in (4096 2048 1024 512 256 128 64 32 16 8 4 2 1) do (
  set /a "%~2 |= %%A"
  for %%B in (!%~2!) do if "!%~1:~%%B,1!" == "" set /a "%~2 &= ~%%A"
)
set /a "%~2 += 1" & exit /b

Liviu

Queue
Posts: 31
Joined: 16 Feb 2013 14:31

Re: strLen boosted

#24 Post by Queue » 20 Feb 2013 21:08

Wow, if I point one of the temp file variants at my RAM drive, it is hilariously fast; I don't think any of the in-batch processing variants can beat that.

Very good point though: when we can expect delayedexpansion to be enabled, a function that works on the string directly, non-destructively, gets a big leg up. The binary search should give consistent results regardless of how polluted the env var space is. The temp file variant should as well, but only consistent to a particular drive (and to some tiny extent the file system); I'm sure that's one of the common arguments against it, though not much different than the performance differences between processors for the temp-file-less variants. They both were certainly quick when I tested them; my temp folder isn't on the speediest drive, so it was a bit slower than any of the other fast variants, but not terrible.

Thanks for the ss64 link. This is fun stuff.

Edit - Why did it become the norm to set a var with the resultant length? It seems like all of the functions use exit /b, so why not just return the length as the errorlevel?

Liviu
Expert
Posts: 470
Joined: 13 Jan 2012 21:24

Re: strLen boosted

#25 Post by Liviu » 20 Feb 2013 23:10

Queue wrote:Why did it become the norm to set a var with the resultant length? It seems like all of the functions use exit /b, so why not just return the length as the errorlevel?

My guess is convenience. Unless the %errorlevel% were to be used right away, it would then take one extra step in the caller code to save it into a persistent variable.

Also, there is the convention to use %errorlevel% as a success/fail indicator. A fully error-checked :strlen4.edx could be written as...

Code: Select all

:strlen4.edx  StrVar  RtnVar  --  be sure to check if the returned errorlevel is 0
if "!" neq "" exit /b 1
set /a "%~2=%random%" || exit /b 2
echo(!%~1!>"%temp%\strlen!%~2!.tmp"|| exit /b 3
for %%F in ("%temp%\strlen!%~2!.tmp") do (
  del "%temp%\strlen!%~2!.tmp"
  set /a "%~2 = %%~zF - 2" && exit /b 0
)
exit /b /4

P.S. Or, just as defensive programming against some doofus having "set errorlevel=123" before "call' ;-)

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: strLen boosted

#26 Post by Aacini » 07 Jul 2024 00:51

I missed this topic when it was published. I didn't know about the "dipstick" method, but it seems very interesting to me.

I devised a new form of make good use of the "dipstick" trick that hasn't been posted before. I think this new form might be the fastest one because it doesn't use a single FOR command and runs around 12 lines in the worst case.

Code: Select all

@echo off
setlocal EnableDelayedExpansion

rem Get the lenght of a string using a modified form of jeb's "dipstick" method
rem Antonio Perez Ayala

rem Generate an initial string 8170 characters long for tests
set "str=X"
for /L %%i in (1,1,12) do set "str=!str!!str!"
set "str=%str%%str:~0,4074%"

:loop
set "len="
set /P "len=Enter the desired lenght: "
if not defined len goto :EOF
set "test=!str:~0,%len%!"
call :APAdipstick test lenght=
echo Calculated lenght = %lenght%
echo/
goto loop


:APAdipstick stringVar len=
setlocal EnableDelayedExpansion

set "len=0"
set "s=!%1!"
if not defined s goto dipstickEnd
set "dipstick=FEDCBA9876543210"

rem First step (fourth one-bit "level") cut the string in a segment 4095 characters long or less
if "%s:~4095,1%" neq "" set "s=%s:~4095%" & set "len=4095"
if not defined s goto dipstickEnd

rem Third level "dipstick" covers a segment of 4095 characters via 15 chunks of 256 chars each
set "len3=%s:~3839,1%%s:~3583,1%%s:~3327,1%%s:~3071,1%%s:~2815,1%%s:~2559,1%%s:~2303,1%%s:~2047,1%%s:~1791,1%%s:~1535,1%%s:~1279,1%%s:~1023,1%%s:~767,1%%s:~511,1%%s:~255,1%%dipstick%"
set /A "len+=len3=0x%len3:~15,1%*256"
set "s=!s:~%len3%!"
if not defined s goto dipstickEnd

rem Second level "dipstick" covers a segment of 255 characters via 15 chunks of 16 chars each
set "len2=%s:~239,1%%s:~223,1%%s:~207,1%%s:~191,1%%s:~175,1%%s:~159,1%%s:~143,1%%s:~127,1%%s:~111,1%%s:~95,1%%s:~79,1%%s:~63,1%%s:~47,1%%s:~31,1%%s:~15,1%%dipstick%"
set /A "len+=len2=0x%len2:~15,1%*16"
set "s=!s:~%len2%!"
if not defined s goto dipstickEnd

rem Original "dipstick" method covers a segment of 15 chunks of 1 character each
set "len1=%s%%dipstick%"
set /A "len+=0x%len1:~15,1%"

:dipstickEnd
endlocal & set "%2=%len%"
exit /B

================================================================================

rem Nested FOR's version, just for fun! ;) (it will be much slower for sure)

:SlowAPAdipstick stringVar len=
setlocal EnableDelayedExpansion

set "len=0"
set "s=!%1!"
if not defined s goto dipstickEnd
set "dipstick=FEDCBA9876543210"

if "%s:~4095,1%" neq "" set "s=%s:~4095%" & set "len=4095"
if not defined s goto dipstickEnd

for %%m in (256 16) do (
   set "chunk=%dipstick%"
   set "n=-1"
   for /L %%# in (1,1,15) do (
      set /A "n+=%%m"
      for %%n in (!n!) do set "chunk=!s:~%%n,1!!chunk!"
   )
   set /A "len+=lenX=0x!chunk:~15,1!*%%m"
   for %%n in (!lenX!) do set "s=!s:~%%n!"
   if not defined s goto dipstickEnd
)

set "len1=%s%%dipstick%"
set /A "len+=0x%len1:~15,1%"

:dipstickEnd
endlocal & set "%2=%len%"
exit /B
A curious point about this method is that its efficiency does not depend on the length of the string. This method will be faster as the string lenght be closer to a given power of two minus one. For example, a string with 4095 characters is the fastest case. The next ones are the strings with multiples of 256 below 4095; for example 3839, 3583, 3327, ..., 767, 511 and 255. The next ones are the strings with multiples of 16 below 255, like 239, 223, 207, ..., 47, 31 and 15 (on each of its corresponding 256-characters segments). The slowest cases are when the lenght of the string divided by 16 gives a remainder (different than zero).

Interesting! Isn't it?

Antonio

einstein1969
Expert
Posts: 960
Joined: 15 Jun 2012 13:16
Location: Italy, Rome

Re: strLen boosted

#27 Post by einstein1969 » 16 Jul 2024 14:35

very interesting, I'm doing some speed testing.

this is a first version, I have to make a macro of it to see the actual speed that the call keeps hidden from me.

Code: Select all

:strlen_small str
	set "$=A!%1!"
	if "!$:~4095,1!" neq "" (set "$=!$:~4095!" & set "$len=4094") else set "$len=-1"
	if "!$:~255,1!" neq "" (
		set "$t=!$:~3839,1!!$:~3583,1!!$:~3327,1!!$:~3071,1!!$:~2815,1!!$:~2559,1!!$:~2303,1!!$:~2047,1!!$:~1791,1!!$:~1535,1!!$:~1279,1!!$:~1023,1!!$:~767,1!!$:~511,1!!$:~255,1!FEDCBA9876543210"
		set /A "$len+=$t=0x!$t:~15,1!*256"
		for %%$ in (!$t!) do set "$=!$:~%%$!
	)
	if defined $ (
		set "$t=!$:~239,1!!$:~223,1!!$:~207,1!!$:~191,1!!$:~175,1!!$:~159,1!!$:~143,1!!$:~127,1!!$:~111,1!!$:~95,1!!$:~79,1!!$:~63,1!!$:~47,1!!$:~31,1!!$:~15,1!FEDCBA9876543210"
		set /A "$t=0x!$t:~15,1!*16"
		for %%$ in (!$t!) do set "$=!$:~%%$!FEDCBA9876543210" 
		set /a $len+=$t+0x!$:~15,1!
	)
goto :eof
I have reduced the operations where possible and I have also inserted an if "!$:~255,1!" neq "" to avoid part of the code where it is not needed. You partitioned very well. In my opinion, the part that is most useful is the one where the strings have a length between 0 and 255. I optimized that part because it is more likely. But we can do better

Post Reply