strLen boosted

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
sowgtsoi
Posts: 8
Joined: 23 Oct 2010 10:14

strLen boosted

#1 Post by sowgtsoi » 24 Oct 2010 16:46

Hi !

I was wondering how to retrieve the length of a string ; I've seen «:strLen» (http://www.dostips.com/DtCodeCmdLib.php#Function.strLen) ; yay !


I've also managed to improve it, at least on this configuration : Intel Core Duo T2300 1.66GHz / Windows XPSP3 {5.1.2600}.
The code is towards the end of this post.



Short version :

Perfectly knowing what I was doing from the get-go 8) {...}, I decided to speed up «:strLen» twofold and to remove its 1023 chars limit at that.



Long version :

Reading the code of «:strLen», the redundancy was bugging me, and, cocksure like an idiot, I was thinking :
~ "Why ? Why ? Why not *sim-ply* put a loop here ?!".

Crafting a «for» loop, I was soon st(h)umbling on the «"!str:~%len%!"» thingy with this nasty problem of delayed expansion that can not be nested.
~ "... tee hee ...", I *ne-ver* thought it was possible, promise ! :mrgreen:

Now seeing why a «for» was not used, I was still perplexed by the absence of any loop here, be it of the «goto:loop» kind.
~ "Is it a case of «goto» allergy ? Is it due to the risk of conflict when putting a label in a function ?".

Anyway, I was devising a manual «goto:loop», renaming «:loop» in a manner unlikely to be encountered outside of «:strLen».
This was working correctly .. except for the fact that it was completing in 220% of the time sufficient for the loop-free version.
~ Zing ! "This redundancy : it's .. it's a case of loop-unrolling ! Hats off !".

Abandoning the idea to tweak «:strLen», I was lurking around, improving my DOS-fu, when I saw at line 7 of «:getFunctions» (http://www.dostips.com/DtCodeCmdLib.php#Function.getFunctions) that this «call set» nesting expansion trick could also be used in a way remarkably similar to «!»-style delayed expansion.
~ "Let's see if a «for» is possible after all ...".

Well, I did not manage to use this trick to good effect ; that said, while trying, I found a way to nest delayed expansions of the «!» variety, which permitted to write the sought-after «for», as slow as it was expected to be.
Knowing what to search, I found that I was not the first to have thought about it (http://judago.webs.com/variablecatches.htm#fw-footer) and replaced my clunky «(set tmp=^^!str:~!len!^^!) & for %%t in ("!tmp!") do if "%%~t"=="" ...» with a leaner «for %%l in ("!len!") do if "!str:~%%~l!"=="" ...».

Now, this «for» is anything but slow : it happens to complete in a good 50% of the time taken by the loop-unrolled implementation of «:strLen».
Maybe that internal jumps are way cheaper than manual ones ; this cannot account for the improvement compared to the loop-unrolling though.
Maybe that some step of the interpreting needs to occur once only for the body of a «for» kind of loop.

With this fancy «for» came an index that was usable to directly compute the notional value of the binary search guess, thus allowing to eliminate the variable «n» and the need to access it ; things then completed in 90% of the previous time.

Positing that strings in the wild tend to be short rather than long, I skewed the search in guessing by quarters rather than halves but this was producing disappointing results speed wise.

All these percents, compounded, made «:strLen» complete in a small 50% of the time taken the loop-unrolled way.


While I was at it, I also focused on the maximum length limit of 1023 characters.
It was tempting to simply increase the number of iterations, and cursory tests were successful, but the limit was still a property of the function, not the language.
Instead, I chose to push the maximum close to whatever limit is inherent to the underlying machinery at hand by counting the number of full chunks of 1024 characters and letting the binary search occur on the rest.
This way, the new version should work wherever the loop-unrolled version works, the search domain being kept untouched.

Attempt 1/4)
I knew that a usual «for» loop can be exited through a «goto» and now being aware that such loops are faster than those of the «goto:loop» kind, I first tried to combine an infinite «for /l %%i in (0,0,1) do ...» with a «goto:break» ... the script stalled. In fact, if a «goto:break» does redirect the flow outside, some echoing shows that this occurs only after the loop has iterated as many times as instructed to, no matter what. The post-«goto:break» iterations do not seem to have any other effects than to take time and to suggest the avoidance of «goto»s in «for»s.

Attempt 2/4)
Falling back to a manual «goto:loop», I obtained something that was working well. Despite the overhead of counting the chunks, the new «:strLen» was completing in 60% of the time taken by the loop-unrolled version.

Attempt 3/4)
Due to the «goto:loop», a label was now embedded in «:strLen». Not really a practical problem, but still ; I entertained the idea of trying a recursive approach, sort of. In fact, I zeroed in on looping directly on the function's label. This required to protect the instructions standing between the label «:strLen» and the body of the loop : they had to be run once only per call from outside «:strLen». Things got a bit dirty. Just before looping, I was appending the string "local_to_strLen" to the semantically numeric chunks' counter : no such thing ought to happen outside of «:strLen», which provides a way to discriminate outside calls and inside jumps. The instructions to protect were placed in a block «(...)» guarded by an «if» : they were to be run only on the condition that "local_to_strLen" was not a suffix of the chunks' counter. When jumped to from inside, the chunks' counter is properly conditioned and the instructions protected / when called from outside, either a homonym of the chunks' counter exists or this variable name is unassigned, anyway neither case exhibits such a suffix.
Again, it worked well .. and somehow quicker than with {Attempt 2} for strings of the first chunk where it was hovering under 90% of the previous time.
But why ? After all, it had more housekeeping to do.
Apparently, the mere fact of enclosing some instructions in a block «(...)» makes them faster :shock:, even when delayed expansion is not involved, which nonetheless sees its general importance accrued. The speed tests consisting of executing «:strLen» repeatedly, it seems that some cacheability is improved. This puts the «for»-related gain in speed under a new light too.
Whatever.

Attempt 4/4)
I put this fresh awareness to good use on {Attempt 2} ...
Enclosing as much as possible of «:strLen»'s body between parentheses worked OK and fast. I was not comfortable with letting a label in a block though, so split it in two, just to see ; it worked even faster.
This enclosing led to a decrease of the time taken by {Attempt 2} to a small 85% !. give or take 5 points depending on the lengths of the tested strings. Which, compared to {Attempt 3}, made it complete as fast for short strings and in 70% of the time for long strings.

Ultimately, this version was/is completing in a good 50%, overhead included, of the time of its loop-unrolled counterpart.



Having read these rules (http://www.dostips.com/DtGeneTermsOfUse.php), I submit this code to your attention :

Code: Select all

:strLen string len -- returns the length of a string via binary search
::                 -- string [in]  - variable name containing the string being measured for length
::                 -- len    [out] - variable to be used to return the string length
:$created 20081122 :$changed 20101020 :$categories StringOperation
:$source http://www.dostips.com
:: the binary search occurs after all full chunks of 1024 characters (1 KiC) have been counted
:: this enables the measurement of lengths nearly as great as the command line limit
:: the shortness of the names s for "string" and r for "remainder" helps in this matter
:: some blocks seem superfluous, they improve speed
:: when initializing s, keep the A up front to ensure we get the length and not the upper bound
::                      it also avoids trouble in case of empty string
( SETLOCAL ENABLEDELAYEDEXPANSION
    set s=A!%~1!
    set nbKiC=0
)
:strLen#repeat_until_less_than_1_KiC_remains_to_search_through
(       set r=!s:~1024!
        if not "!r!"=="" (
            set /a nbKiC+=1
            set s=!r!
            goto:strLen#repeat_until_less_than_1_KiC_remains_to_search_through
        )
    set len=0
    for /l %%e in (9,-1,0) do (
        set /a "len|=1<<%%e" & for %%l in ("!len!") do if "!s:~%%~l!"=="" set /a "len&=~(1<<%%e)"
    )
    set /a "len+=nbKiC<<10"
)
( ENDLOCAL & REM RETURN VALUES
    IF "%~2" NEQ "" SET "%~2=%len%"
)
EXIT /b



Side notes :
  • While lurking here, I've seen a previous version (http://www.dostips.com/DtCodeFunctions.php#_Toc128586394) of «:strLen» (http://www.dostips.com/DtCodeCmdLib.php#Function.strLen).
  • Tests of correctness and speed are provided in the next post.
  • Testing required a less locale dependent version of «:getTod» (http://www.dostips.com/DtCodeFunctions.php#_Toc128586395), here is it :

    Code: Select all

    :getTod -- get a Time of Day value in 1/100th seconds
    ::   -- %~1: out - time of day
    SETLOCAL
    set t=%time: =0%
    set /a t=((1%t:~0,2%*60+1%t:~3,2%)*60+1%t:~6,2%)*100+1%t:~9,2%-36610100
    ( ENDLOCAL & REM RETURN VALUES
        IF "%~1" NEQ "" SET %~1=%t%
    )
    GOTO:EOF



Thanks for your time !

sowgtsoi
Posts: 8
Joined: 23 Oct 2010 10:14

strLen tests

#2 Post by sowgtsoi » 24 Oct 2010 16:50

To test on a system of the NT line, copy the following files in a dedicated folder.

The first file is intended to be modified at will depending on what is tested.
The others will/should require no tweaking.

The tests of correctness produce results in {OK,FAIL}.
After a comprehensive test of correctness, searching 'FAIL'-without-quotes in an editor is a quick way to spot problems or limitations.



strLen_tests.cmd

Code: Select all

@echo off
:: 2010-10-22
SETLOCAL ENABLEDELAYEDEXPANSION

:: always working in the directory of this script :
%~d0 & cd "%~dp0"

:: setup :
if not exist logs\ mkdir logs
if exist tmp\ rmdir /S /Q tmp
mkdir tmp
for %%v in (
    A_loopUnrolled
    B_loopGoto
    C_loopFor
    D_loopForJudago
    E_loopForOptim
    F_searchSkewed
    G_chunksAttempt2
    H_chunksAttempt3
    I_chunks
    ) do (
    type "strLen_tests_stub.txt" "strLen%%v.txt" > "tmp\test_strLen%%v.cmd"
) 1>nul 2>nul

:: syntactic sugar :
set      "test= call ^"tmp\test_strLen^^!version^^!.cmd^" "
set   "do_test= "
set "skip_test= if 0==1 "


%do_test% (echo:&echo:&ver&echo:Unit test :
    set "version=A_loopUnrolled"  &%test% unittest
    set "version=B_loopGoto"      &%test% unittest
    set "version=C_loopFor"       &%test% unittest
    set "version=D_loopForJudago" &%test% unittest
    set "version=E_loopForOptim"  &%test% unittest
    set "version=F_searchSkewed"  &%test% unittest
    set "version=G_chunksAttempt2"&%test% unittest
    set "version=H_chunksAttempt3"&%test% unittest
    set "version=I_chunks"        &%test% unittest
) >con
::>"logs\strLen_unittest.log"
::>con


%skip_test% (echo:&echo:&ver&echo:Points of interest :
    set "version=A_loopUnrolled"  &%test% correctness    0_start 8_span
    set "version=B_loopGoto"      &%test% correctness    0_start 8_span
    set "version=C_loopFor"       &%test% correctness    0_start 8_span
    set "version=D_loopForJudago" &%test% correctness    0_start 8_span
    set "version=E_loopForOptim"  &%test% correctness    0_start 8_span
    set "version=F_searchSkewed"  &%test% correctness  508_start 8_span
    set "version=G_chunksAttempt2"&%test% correctness 1020_start 8_span
                                   %test% correctness 2044_start 8_span
    set "version=H_chunksAttempt3"&%test% correctness 1020_start 8_span
                                   %test% correctness 2044_start 8_span
    set "version=I_chunks"        &%test% correctness    0_start 8_span
                                   %test% correctness 1020_start 8_span
                                   %test% correctness 7164_start 8_span
) >con
::>"logs\strLen_correctness_partial.log"
::>con


%do_test% (echo:&echo:&ver&echo:Limits :
    rem function dependent :
    set "version=A_loopUnrolled"  &%test% correctness 1020_start 8_span
    set "version=B_loopGoto"      &%test% correctness 1020_start 8_span
    set "version=C_loopFor"       &%test% correctness 1020_start 8_span
    set "version=D_loopForJudago" &%test% correctness 1020_start 8_span
    set "version=E_loopForOptim"  &%test% correctness 1020_start 8_span
    set "version=F_searchSkewed"  &%test% correctness 1020_start 8_span
   
    rem system dependent (command line limit minus some length for the used command) :
    set "version=G_chunksAttempt2"&%test% correctness 8183_start 8_span
    set "version=H_chunksAttempt3"&%test% correctness 8183_start 8_span
    set "version=I_chunks"        &%test% correctness 8183_start 8_span
) >con
::>"logs\strLen_correctness_limits.log"
::>con


:: with all versions, takes 5 good minutes at 1.7 GHz :
%skip_test% (echo:&echo:&ver&echo:Comprehensive testing :
    set "version=A_loopUnrolled"  &%test% correctness 0_start 1030_span
    set "version=B_loopGoto"      &%test% correctness 0_start 1030_span
    set "version=C_loopFor"       &%test% correctness 0_start 1030_span
    set "version=D_loopForJudago" &%test% correctness 0_start 1030_span
    set "version=E_loopForOptim"  &%test% correctness 0_start 1030_span
    set "version=F_searchSkewed"  &%test% correctness 0_start 1030_span
    set "version=G_chunksAttempt2"&%test% correctness 0_start 8200_span
    set "version=H_chunksAttempt3"&%test% correctness 0_start 8200_span
    set "version=I_chunks"        &%test% correctness 0_start 8200_span
) >"logs\strLen_correctness_full_test.log"
::>"logs\strLen_correctness_full_test.log"
::>con


:: with all versions, takes 10 good minutes at 1.7 GHz :
%skip_test% (echo:&echo:&echo:&ver&echo:Speed comparisons :>con
    echo:&set "version=A_loopUnrolled"  &%test% speed    0_start  250_span 8_times
    rem "warm up" done.
                                         %test% speed    0_start  250_span 8_times
                                         %test% speed  250_start  250_span 8_times
                                         %test% speed  500_start  250_span 8_times
                                         %test% speed  750_start  250_span 8_times
    echo:&set "version=B_loopGoto"      &%test% speed    0_start  250_span 8_times
                                         %test% speed  250_start  250_span 8_times
                                         %test% speed  500_start  250_span 8_times
                                         %test% speed  750_start  250_span 8_times
    echo:&set "version=C_loopFor"       &%test% speed    0_start  250_span 8_times
    echo:&set "version=D_loopForJudago" &%test% speed    0_start  250_span 8_times
    echo:&set "version=E_loopForOptim"  &%test% speed    0_start  250_span 8_times
                                         %test% speed  250_start  250_span 8_times
                                         %test% speed  500_start  250_span 8_times
                                         %test% speed  750_start  250_span 8_times
    echo:&set "version=F_searchSkewed"  &%test% speed    0_start  250_span 8_times
                                         %test% speed  250_start  250_span 8_times
                                         %test% speed  500_start  250_span 8_times
                                         %test% speed  750_start  250_span 8_times
    echo:&set "version=G_chunksAttempt2"&%test% speed    0_start  250_span 8_times
                                         %test% speed  250_start  250_span 8_times
                                         %test% speed  500_start  250_span 8_times
                                         %test% speed  750_start  250_span 8_times
                                         %test% speed 1000_start 1000_span 2_times
                                         %test% speed 2000_start 1000_span 2_times
                                         %test% speed 7000_start 1000_span 2_times
    echo:&set "version=H_chunksAttempt3"&%test% speed    0_start  250_span 8_times
                                         %test% speed  250_start  250_span 8_times
                                         %test% speed  500_start  250_span 8_times
                                         %test% speed  750_start  250_span 8_times
                                         %test% speed 1000_start 1000_span 2_times
                                         %test% speed 2000_start 1000_span 2_times
                                         %test% speed 7000_start 1000_span 2_times
    echo:&set "version=I_chunks"        &%test% speed    0_start  250_span 8_times
                                         %test% speed  250_start  250_span 8_times
                                         %test% speed  500_start  250_span 8_times
                                         %test% speed  750_start  250_span 8_times
                                         %test% speed 1000_start 1000_span 2_times
                                         %test% speed 2000_start 1000_span 2_times
                                         %test% speed 7000_start 1000_span 2_times
    rem to check consistency, the first test is repeated :
    echo:&set "version=A_loopUnrolled"  &%test% speed    0_start  250_span 8_times
) >>"logs\strLen_speed_comparisons.log"
::>>"logs\strLen_speed_comparisons.log"
::>con


:: cleaning :
if exist tmp\ rmdir /S /Q tmp
echo:
echo:
echo:strLen tests : completed.
pause
ENDLOCAL
@echo on
@EXIT /b



strLen_tests_stub.txt

Code: Select all

:: echo stays as defined by the calling script
:: 2010-10-21
SETLOCAL ENABLEDELAYEDEXPANSION

::                                      tmp\<this script>.cmd
:: always working in the parent directory of this script :
%~d0 & cd "%~dp0" & cd ..

echo:
:: "nul set p" see http://groups.google.com/group/microsoft.public.win2000.cmdprompt.admin/browse_frm/thread/0874b48308d94030
<nul set /p "?=*" >con
<nul set /p "?=%~n0 %* :"
call :%*

ENDLOCAL
:: echo stays as defined by the calling script
@EXIT /b
:: Functions :


:correctness begin span
::           int   int
echo:
SETLOCAL ENABLEDELAYEDEXPANSION
set "begin=%~1" & set /a "begin+=0" &rem sanitization used to allow number labelling
set "end=%~2"   & set /a "end+=0, end+=begin, end-=1"
set "s="
for /l %%l in (0,1,%begin%) do set s=x!s!
set s=%s:~1%
::
for /l %%l in (%begin%,1,%end%) do (
    call :strLen s len
    if "!len!"=="%%l" ( set "res= OK " & set "msg=" ) else ( set "res=FAIL" & set "msg=  should be '%%l'" )
    echo:[!res!]  '!len!'!msg!
    set s=x!s!
)
ENDLOCAL
EXIT /b


:speed begin span repeat
::     int   int  int
SETLOCAL ENABLEDELAYEDEXPANSION
set "begin=%~1"  & set /a "begin+=0" &rem sanitization used to allow number labelling
set "end=%~2"    & set /a "end+=0, end+=begin, end-=1"
set "repeat=%~3" & set /a "repeat+=0"
set "s_init="
for /l %%i in (0,1,%begin%) do set s_init=x!s_init!
set s_init=%s_init:~1%
::
call :getTod tcs_start
for /l %%j in (1,1,%repeat%) do (
    set s=%s_init%
    for /l %%i in (%begin%,1,%end%) do (
        call :strLen s len
        set s=x!s!
    )
)
call :getTod tcs_finish
set /a "tcs=tcs_finish-tcs_start" & if !tcs! LSS 0 set /a tcs+=8640000
<nul set /p "?= %tcs% cs."
ENDLOCAL
EXIT /b


:: less locale dependent version of getTod
:: adapted from http://www.dostips.com/DtCodeFunctions.php#_Toc128586395
::
:getTod -- get a Time of Day value in 1/100th seconds
::   -- %~1: out - time of day
SETLOCAL
set t=%time: =0%
set /a t=((1%t:~0,2%*60+1%t:~3,2%)*60+1%t:~6,2%)*100+1%t:~9,2%-36610100
( ENDLOCAL & REM RETURN VALUES
    IF "%~1" NEQ "" SET %~1=%t%
)
GOTO:EOF


:unittest
echo:
:: some "!" here requires to disable delayed expansion
SETLOCAL DISABLEDELAYEDEXPANSION
echo:---- :unittest.strLen - output
call :unittest.strLen
echo:----
ENDLOCAL
EXIT /b


:unittest.strLen
:$created 20081124 :$changed 20081124
:$source http://www.dostips.com
for %%C in (
        ""
        "H"
        "Hi"
        "Hi_"
        "Hi T"
        "Hi There, what's up!"
    ) do (
    set "str=%%~C"
    call:strLen str len && (
        call:Format "[5] characters for [20]." "'%%len%%'" "'%%str%%'"
    )
)
EXIT /b


:Format fmt str1 str2 ... -- outputs columns of strings right or left aligned
::                        -- fmt [in] - format string specifying column width and alignment, i.e. "[-10][10][10]"
:$created 20060101 :$changed 20091130 :$categories Echo
:$source http://www.dostips.com
SETLOCAL
set "fmt=%~1"
set "line="
set "spac=                                                     "
set "i=1"
for /f "tokens=1,2 delims=[" %%a in ('"echo..%fmt:]=&echo..%"') do (
    set /a i+=1
    call call set "subst=%%%%~%%i%%%spac%%%%%~%%i%%"
    if %%b0 GEQ 0 (call set "subst=%%subst:~0,%%b%%"
    ) ELSE        (call set "subst=%%subst:~%%b%%")
    call set "const=%%a"
    call set "line=%%line%%%%const:~1%%%%subst%%"
)
echo.%line%
EXIT /b


:: the tested version of strLen will be automatically appended here :



strLenA_loopUnrolled.txt

Code: Select all

:strLen string len -- returns the length of a string via binary search, maximum length 1023
::                 -- string [in]  - variable name containing the string being measured for length
::                 -- len    [out] - variable to be used to return the string length
:$created 20081122 :$changed 20081122 :$categories StringOperation
:$source http://www.dostips.com
SETLOCAL ENABLEDELAYEDEXPANSION
set "str=A!%~1!"&rem keep the A up front to ensures we get the length and not the upper bound
                 rem it also avoids trouble in case of empty string
set len=0
set /a "n=1<<10"&rem 2^10=1024
set /a "n>>=1, len|=n"
 if "!str:~%len%!"=="" set /a "len&=~n"
set /a "n>>=1, len|=n"
 if "!str:~%len%!"=="" set /a "len&=~n"
set /a "n>>=1, len|=n"
 if "!str:~%len%!"=="" set /a "len&=~n"
set /a "n>>=1, len|=n"
 if "!str:~%len%!"=="" set /a "len&=~n"
set /a "n>>=1, len|=n"
 if "!str:~%len%!"=="" set /a "len&=~n"
set /a "n>>=1, len|=n"
 if "!str:~%len%!"=="" set /a "len&=~n"
set /a "n>>=1, len|=n"
 if "!str:~%len%!"=="" set /a "len&=~n"
set /a "n>>=1, len|=n"
 if "!str:~%len%!"=="" set /a "len&=~n"
set /a "n>>=1, len|=n"
 if "!str:~%len%!"=="" set /a "len&=~n"
set /a "n>>=1, len|=n"
 if "!str:~%len%!"=="" set /a "len&=~n"
( ENDLOCAL & REM RETURN VALUES
    IF "%~2" NEQ "" SET "%~2=%len%"
)
EXIT /b



strLenB_loopGoto.txt

Code: Select all

:strLen string len
SETLOCAL ENABLEDELAYEDEXPANSION
set "str=A!%~1!"&rem keep the A up front to ensure we get the length and not the upper bound
                 rem it also avoids trouble in case of empty string
rem             1024 == 2^10
set /a "len=0, iterations=10"
set /a "n=1<<%iterations%"
:strLen#binary_search
set /a "n>>=1, len|=n"
 if "!str:~%len%!"=="" set /a "len&=~n"
set /a "iterations-=1" & if 0 LSS !iterations! goto:strLen#binary_search
( ENDLOCAL & REM RETURN VALUES
    IF "%~2" NEQ "" SET "%~2=%len%"
)
EXIT /b



strLenC_loopFor.txt

Code: Select all

:strLen string len
SETLOCAL ENABLEDELAYEDEXPANSION
set "str=A!%~1!"&rem keep the A up front to ensure we get the length and not the upper bound
                 rem it also avoids trouble in case of empty string
rem       1024 == 2^10
set /a "len=0, n=1<<10"
for /l %%i in (10,-1,1) do (
    set /a "n>>=1, len|=n" & set tmp=^^!str:~!len!^^!
    for %%t in ("!tmp!") do if "%%~t"=="" set /a "len&=~n"
)
( ENDLOCAL & REM RETURN VALUES
    IF "%~2" NEQ "" SET "%~2=%len%"
)
EXIT /b



strLenD_loopForJudago.txt

Code: Select all

:strLen string len
SETLOCAL ENABLEDELAYEDEXPANSION
set "str=A!%~1!"&rem keep the A up front to ensure we get the length and not the upper bound
                 rem it also avoids trouble in case of empty string
rem       1024 == 2^10
set /a "len=0, n=1<<10"
for /l %%i in (10,-1,1) do (
    set /a "n>>=1, len|=n" & for %%l in ("!len!") do if "!str:~%%~l!"=="" set /a "len&=~n"
)
( ENDLOCAL & REM RETURN VALUES
    IF "%~2" NEQ "" SET "%~2=%len%"
)
EXIT /b



strLenE_loopForOptim.txt

Code: Select all

:strLen string len
SETLOCAL ENABLEDELAYEDEXPANSION
set "str=A!%~1!"&rem keep the A up front to ensure we get the length and not the upper bound
                 rem it also avoids trouble in case of empty string
set len=0
for /l %%e in (9,-1,0) do (
    set /a "len|=1<<%%e" & for %%l in ("!len!") do if "!str:~%%~l!"=="" set /a "len&=~(1<<%%e)"
)
( ENDLOCAL & REM RETURN VALUES
    IF "%~2" NEQ "" SET "%~2=%len%"
)
EXIT /b



strLenF_searchSkewed.txt

Code: Select all

:strLen
SETLOCAL ENABLEDELAYEDEXPANSION
set "str=A!%~1!"&rem keep the A up front to ensure we get the length and not the upper bound
                 rem it also avoids trouble in case of empty string
set len=0
for /l %%e in (8,-2,0) do (
    set /a "len+=1<<%%e"
    for %%l in ("!len!") do ( if "!str:~%%~l!"=="" ( set /a "len-=1<<%%e" ) else (
    set /a "len+=1<<%%e"
    for %%m in ("!len!") do ( if "!str:~%%~m!"=="" ( set /a "len-=1<<%%e" ) else (
    set /a "len+=1<<%%e"
    for %%n in ("!len!") do   if "!str:~%%~n!"==""   set /a "len-=1<<%%e"
)))))
( ENDLOCAL & REM RETURN VALUES
    IF "%~2" NEQ "" SET "%~2=%len%"
)
EXIT /b



strLenG_chunksAttempt2.txt

Code: Select all

:strLen string len
SETLOCAL ENABLEDELAYEDEXPANSION
set str=A!%~1!&rem keep the A up front to ensure we get the length and not the upper bound
               rem it also avoids trouble in case of empty string
set nbKiC=0&rem 1024 characters == 1 KiC
:strLen#repeat_until_less_than_1_KiC_remains_to_search_through
    set rmd=%str:~1024%
    if not "%rmd%"=="" (
        set /a nbKiC+=1
        set str=%rmd%
        goto:strLen#repeat_until_less_than_1_KiC_remains_to_search_through
    )
set len=0
for /l %%e in (9,-1,0) do (
    set /a "len|=1<<%%e" & for %%l in ("!len!") do if "!str:~%%~l!"=="" set /a "len&=~(1<<%%e)"
)
set /a "len+=nbKiC<<10"
( ENDLOCAL & REM RETURN VALUES
    IF "%~2" NEQ "" SET "%~2=%len%"
)
EXIT /b



strLenH_chunksAttempt3.txt

Code: Select all

:strLen string len
::       1024 characters == 1 KiC
if not "local_to_strLen"=="%nbKiC:~-15%" (
    SETLOCAL ENABLEDELAYEDEXPANSION
    set str=A!%~1!&rem keep the A up front to ensure we get the length and not the upper bound
                   rem it also avoids trouble in case of empty string
    set nbKiC=0
)
set rmd=%str:~1024%
if not "%rmd%"=="" (
    set /a nbKiC+=1
    set str=%rmd%
    set nbKiC=!nbKiC!local_to_strLen& goto:strLen
)
set len=0
for /l %%e in (9,-1,0) do (
    set /a "len|=1<<%%e" & for %%l in ("!len!") do if "!str:~%%~l!"=="" set /a "len&=~(1<<%%e)"
)
set /a "len+=nbKiC<<10"
( ENDLOCAL & REM RETURN VALUES
    IF "%~2" NEQ "" SET "%~2=%len%"
)
EXIT /b



strLenI_chunks.txt

Code: Select all

:strLen string len -- returns the length of a string via binary search
::                 -- string [in]  - variable name containing the string being measured for length
::                 -- len    [out] - variable to be used to return the string length
:$created 20081122 :$changed 20101021 :$categories StringOperation
:$source http://www.dostips.com
:: the binary search occurs after all full chunks of 1024 characters (1 KiC) have been counted
:: this enables the measurement of lengths nearly as great as the command line limit
:: the shortness of the names s for "string" and r for "remainder" helps in this matter
:: some blocks seem superfluous, they improve speed
:: when initializing s, keep the A up front to ensure we get the length and not the upper bound
::                      it also avoids trouble in case of empty string
( SETLOCAL ENABLEDELAYEDEXPANSION
    set s=A!%~1!
    set nbKiC=0
)
:strLen#repeat_until_less_than_1_KiC_remains_to_search_through
(       set r=!s:~1024!
        if not "!r!"=="" (
            set /a nbKiC+=1
            set s=!r!
            goto:strLen#repeat_until_less_than_1_KiC_remains_to_search_through
        )
    set len=0
    for /l %%e in (9,-1,0) do (
        set /a "len|=1<<%%e" & for %%l in ("!len!") do if "!s:~%%~l!"=="" set /a "len&=~(1<<%%e)"
    )
    set /a "len+=nbKiC<<10"
)
( ENDLOCAL & REM RETURN VALUES
    IF "%~2" NEQ "" SET "%~2=%len%"
)
EXIT /b



Aaaand that's all !

ghostmachine4
Posts: 319
Joined: 12 May 2006 01:13

Re: strLen boosted

#3 Post by ghostmachine4 » 24 Oct 2010 20:34

why use batch in the first place??

Code: Select all

wscript.Echo Len(Wscript.Arguments(0))

sowgtsoi
Posts: 8
Joined: 23 Oct 2010 10:14

Re: strLen boosted

#4 Post by sowgtsoi » 27 Oct 2010 15:35

WSH is cool too :D.

That said, there is some hefty overlap between what batch and WSH scripting are fit for.

Obviously, when comes the time to measure the length of a string, WSH has the upper hand.

There are many reasons - objective/subjective/good/bad - though, to use batch rather than WSH (even in this case) :
  • immediacy : the transition from the command line to a batch script is frictionless.
  • organic growth : the little command/script that becomes beefier and beefier ; it can or can not become wise (need/cost/self-improvement/etc.) one day to port it to WSH/PowerShell/etc.
  • network effect, somehow : everybody and his dog has it, so there are plenty of ready-made solutions to problems of all kinds.
  • investment : the cost of switching is not worth it for a small script.
  • involvement : the cost of switching is not worth it for a big script when the limitation to overcome is, err, limited/tiny/not-a-definitive-showstopper.
  • fun : to see what a tool can do.
  • possibly other reasons that elude me now.


Edit : I meant PowerShell, wrote Powerbatch, which is a different beast.

jeb
Expert
Posts: 1055
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

Re: strLen boosted

#5 Post by jeb » 15 Nov 2010 09:06

Hi sowgtsoi,

I'm always interested in solutions with pure batch, so your post is very interesting.

I have one idea to improve your solution.

With short strings (<=15 chars) you can use a lookup trick.

Code: Select all

set "str=%1"
call :strLength result str
echo %str% length is %result%
goto :eof

:strLength
setlocal EnableDelayedExpansion
set "temp=!str!FEDCBA9876543210"
set /a len=0x!temp:~15,1!
(
endlocal
set %1=%len%
goto :eof
)

jeb

amel27
Expert
Posts: 177
Joined: 04 Jun 2010 20:05
Location: Russia

Re: strLen boosted

#6 Post by amel27 » 16 Nov 2010 03:21

Excellent!.. From this follows quick code for string up to 255 chars:

Code: Select all

set "str=%1"
call :strLength result str
echo %str% length is %result%
goto :eof

:strLength
setlocal EnableDelayedExpansion
REM broken into multiple lines by forum moderator to address forum readability issue.  FOR BETTER PERFORMANCE SET TEMP IN SINGLE LINE
set "temp=!str!!str!"
set "temp=!temp!FFFEFDFCFBFAF9F8F7F6F5F4F3F2F1F0EFEEEDECEBEAE9E8E7E6E5E4E3E2E1E0DFDEDDDCDBDAD9D8D7D6D5D4D3D2D1D0CFCECDCCCBCAC9C8C7C6C5C4C3C2C1C0"
set "temp=!temp!BFBEBDBCBBBAB9B8B7B6B5B4B3B2B1B0AFAEADACABAAA9A8A7A6A5A4A3A2A1A09F9E9D9C9B9A999897969594939291908F8E8D8C8B8A89888786858483828180"
set "temp=!temp!7F7E7D7C7B7A797877767574737271706F6E6D6C6B6A696867666564636261605F5E5D5C5B5A595857565554535251504F4E4D4C4B4A49484746454443424140"
set "temp=!temp!3F3E3D3C3B3A393837363534333231302F2E2D2C2B2A292827262524232221201F1E1D1C1B1A191817161514131211100F0E0D0C0B0A09080706050403020100"
set/a len=0x!temp:~510,2!
(
endlocal
set %1=%len%
goto :eof
)
Last edited by amel27 on 16 Nov 2010 22:34, edited 1 time in total.

jeb
Expert
Posts: 1055
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

Re: strLen boosted

#7 Post by jeb » 16 Nov 2010 17:32

@Amel27: Nice extension, simple but good.

I think only about to sync the read values, but your solution is much better.

jeb

DosItHelp
Expert
Posts: 239
Joined: 18 Feb 2006 19:54

Re: strLen boosted

#8 Post by DosItHelp » 16 Nov 2010 22:07

Jeb, Amel, very very nice!


sowgtsoi,

Yes the :strlen function at http://www.dostips.com/DtCodeCmdLib.php#Function.strLen turned out the way it is because:

goto:label - is darn slow
"goto" appears to search for "label" by looking downwards in the batch starting from the current command until it hits the end of the batch, and then it wraps around to continue searching from the beginning. So goto performance is bad when jumping backwards and dependents on the size of the batch file.
Also if someone uses by accident the same label elsewhere in the script then it would find this other label first and the :strlen function would break.

bit shifting in loop - double expansion problem
Yes a script block within a loop is fast, because the whole script block is read from file only once, whereas script outside a block is read line by line from file right before processing. In fact a script can throw script code in front of its own feed while executing - very fun for writing script that modifies itself.
However changing the "len" variable used to determine sub-string of "str" requires double variable expansion, delayed-expansion and "!var!" syntax alone doesn't help here and using "call set ..." type expansion is slow because "call" creates a new command context each time.

So an unrolled loop was not to bad of a choice because:
It was performing good and you can easily scale the function up or down to adapt to different string length, i.e. doubling the supported string size requires only to change the initializer from "n=1<<10" to "n=1<<11" and duplicate another "set and if" line pair in the function body.

Well I admit I was never really proud of this function. Thanks a lot for helping speeding this up and making it much shorter.
The :strlen function has been updated in the library.

jeb
Expert
Posts: 1055
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

Re: strLen boosted

#9 Post by jeb » 17 Nov 2010 01:27

Hi DosItHelp,

your solutions looks like the one of sowgtsoi, without the part of splitting down to 1024 chars.
It's more clear, but you can optimize the for-loop.

Because the maximum allowed string length is less than 8192 characters, this should always be enough

Code: Select all

for /L %%A in (12,-1,0) do


jeb

DosItHelp
Expert
Posts: 239
Joined: 18 Feb 2006 19:54

Re: strLen boosted

#10 Post by DosItHelp » 17 Nov 2010 17:37

oh - I should have looked a bit closer, no real need for my post this again (removed).

orange_batch
Expert
Posts: 442
Joined: 01 Aug 2010 17:13
Location: Canadian Pacific
Contact:

Re: strLen boosted

#11 Post by orange_batch » 21 Nov 2010 15:50

Nice work guys. I'll be the first to admit I suck with any functional relationship involving logical shifts and bitwise operations. :(
I know what they do, I just don't know how to apply them...

This is DosItHelp's script cut down. Could somebody teach me the thought process behind this?

Code: Select all

set "str=A!any_variable!"
set "len=0"

for /L %%A in (12,-1,0) do (
    set /a "len|=1<<%%A"
    for %%B in (!len!) do if "!str:~%%B,1!"=="" set /a "len&=~1<<%%A"
)

DosItHelp
Expert
Posts: 239
Joined: 18 Feb 2006 19:54

Re: strLen boosted

#12 Post by DosItHelp » 22 Nov 2010 00:32

Sure...

In short:
The value for the len variable is build successively by setting its binary bits using a loop and trial-and-error.
The loop loops 13 times. Each time looping a different bit is being decided on. The first loop decides about bit 12 (4096) of the len value the next about bit 11 (2048) working all the way down until the final loop decides about bit 0 (1).
During each loop the bit to be decided on is blindly being set (trial) causing the len variable to increase according to this bit position. A subsequent check tests if this bit caused the len variable now to become too large, in which case (error) the bit will be cleared again, otherwise it will stay. Then we move on to the next smaller bit position.

Code: Select all

decimal binary
------------------------
        5432109876543210
4096 =  0001000000000000
2048 =  0000100000000000
1024 =  0000010000000000
.512 =  0000001000000000
.256 =  0000000100000000
.128 =  0000000010000000
..64 =  0000000001000000
..32 =  0000000000100000
..16 =  0000000000010000
...8 =  0000000000001000
...4 =  0000000000000100
...2 =  0000000000000010
...1 =  0000000000000001

The bit position is easily determined using a loop looping from 12 down to 0:
for /L %%A in (12,-1,0) do (

The bit position in the len variable is set by left-shifting the number '1' according to the loop variable and applying a binary "OR" operator. This effectively results in a increase of the len value according to the bit position:
set /a "len|=1<<%%A"

To overcome the limitation of double variable expansion the new len variable is then temporary being storred in another loop varaible %%B.
for %%B in (!len!) do

A subsequent check tests if a character exists at the position given by the newly calculated len variable. If not then the string must be shorter and the bit must be cleared again.
if "!str:~%%B,1!"=="" ...

The bit is cleared (if needed) by applying a binary "AND" operation using a bit mask that allows all bits in len to stay execept the one that is to be cleared.
set /a "len&=~1<<%%A"


Some more background:
The problem at hand is that the command interpreter has no native support to determine the length of a string. However using variable-substitution it is possible to check if the string is shorter than a given length. For example you can say: give me the 16th character in a string and if the result is 'empty' then the string must be shorter then 16 characters.
Great so now we can determine the string length using trial-and-error starting with 1, if 1st character returns 'empty' then string length is 0, otherwise try 2nd character, then 3rd,... all the way to the max string size of 8191. That of course would work well (fast) for short strings but gets slower and slower the longer the string. Its like trying to find a word in a dictionary by reading the dictionary starting from the beginning page by page, until the word is found.
To speed this up we take a binary search approach.

The command interpreter supports strings that are less than 8192 characters long. So by checking the position at half that size (8192/2=4096) we eliminate half of the possible string length. For example if position 4096 returns 'empty' then we know the string length is less than 4096 and we don't have to check for length 4096..8191 any more. Then we take this renaining half (0..4095) and check it right at its half at position 2048 to eliminate the next smaller half of posible string lengths. For example if position 2084 returns not 'empty' then the string must be between 2048..4095.
Each loop will divide the remaining possibilities for the string length by 2, that's why its valid to say we use a binary 'search' (or 'trial-and-error') to determine the string length, and binary in this case also means efficient.

Hope this helps ;)

orange_batch
Expert
Posts: 442
Joined: 01 Aug 2010 17:13
Location: Canadian Pacific
Contact:

Re: strLen boosted

#13 Post by orange_batch » 22 Nov 2010 03:05

Thanks a lot. I'll study this.

Edit: Oh, I get it! That binary list helped a lot. Yay, now I'm not in the out-crowd. 8)

sowgtsoi
Posts: 8
Joined: 23 Oct 2010 10:14

Re: strLen boosted

#14 Post by sowgtsoi » 12 Jan 2011 18:20

bump

(tests' sources in the next post)


@jeb and @amel27 :
Fast ! As in : your use of a 'dipstick' completes in less than 66% of the time taken by a binary search :shock:.


@DosItHelp :
Compared to the version with some splitting, the 20101116 version of «:strLen» completes in less than 90% of the time for short strings and less than 35% of the time for long strings !
Substituting «!str:~%%B!» by «!str:~%%B,1!» in the comparison is something I wish I had thought to do :oops:.


@all :
About the 8 KiB limit : there are indeed good reasons to think that it won't change anytime soon.

Regarding the transition away from the potentially confusing «%%l» to the more legible «%%A» or «%%B», I think that this resource can be of interest.

So .. I've tried different hybrids "partial-binary-search-followed-by-a-gauging" by using a handfull of dipsticks containing 16 to 1024 graduations (in the which case this baroque horror of a dipstick fills a page in its entirety).
Fortunately, it appears that using a 256 graduations dipstick is the sweet spot that provides the fastest results.
For smaller dipsticks : the binary search takes too much time / for bigger dipsticks : appending, lookup, either or both is too costly.
In the process, I've also found that by reordering the digits of the dipstick it was possible to avoid the duplication of the measured string.
I've then tried to further reduce the footprint of the dipstick by eliminating some redundancy, but to no good effect.

All this leads to the fastest known version of «:strLen» to date :

Code: Select all

:: A binary search is complemented by the use of a dipstick to leverage the brutal efficiency of random access :
:strLen string len -- returns the length of a string
::                 -- string [in]  - variable name containing the string being measured for length
::                 -- len    [out] - variable to be used to return the string length
:: Many thanks to 'sowgtsoi', but also 'jeb' and 'amel27' dostips forum users helped making this short and efficient
:$created 20081122 :$changed 20101227 :$categories StringOperation
:$source http://www.dostips.com
(   SETLOCAL ENABLEDELAYEDEXPANSION
    set "str=A!%~1!"&rem keep the A up front to ensure we get the length and not the upper bound
                     rem it also avoids trouble in case of empty string
    set "len=0"
    for /L %%A in (12,-1,8) do (
        set /a "len|=1<<%%A"
        for %%B in (!len!) do if "!str:~%%B,1!"=="" set /a "len&=~1<<%%A"
    )
    for %%B in (!len!) do set str=!str:~%%B,-1!^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FFFFFFFFFFFFFFFFEEEEEEEEEEEEEEEEDDDDDDDDDDDDDDDDCCCCCCCCCCCCCCCC^
BBBBBBBBBBBBBBBBAAAAAAAAAAAAAAAA99999999999999998888888888888888^
7777777777777777666666666666666655555555555555554444444444444444^
3333333333333333222222222222222211111111111111110000000000000000
    set /a len+=0x!str:~0x1FF,1!!str:~0xFF,1!
)
( ENDLOCAL & REM RETURN VALUES
    IF "%~2" NEQ "" SET /a %~2=%len%
)
EXIT /b

Minuses :
The code is not as lean as the 20101116 version of «:strLen».

Plusses :
Compared to the already refined 20101116 version, this end result still manages to complete for short strings in a good 90% of the previous time, and for long strings in a small 80%.
Custom tailoring derived versions to specific cases is for the most unchanged *and* the code already hints toward what to do for very short strings.


Thanks for your time !

Edit 20110116 : replaced the word 'gauge' by the fitter : 'dipstick'.
Last edited by sowgtsoi on 16 Jan 2011 16:50, edited 2 times in total.

sowgtsoi
Posts: 8
Joined: 23 Oct 2010 10:14

Re: strLen boosted

#15 Post by sowgtsoi » 12 Jan 2011 18:25

To test on an XP system or higher, copy the following files in the dedicated folder described in the second post of this thread ("strLen_tests_stub.txt" and "strLenI_chunks.txt" should be present too).

Note : the differences in the amounts of comments of each function do have a notable impact on the accuracy of speed comparisons ; this impact is not significant though.



strLen_more_tests.cmd

Code: Select all

@echo off
:: 2011-01-08
SETLOCAL ENABLEDELAYEDEXPANSION

:: always working in the directory of this script :
%~d0 & cd "%~dp0"

:: setup :
if not exist logs\ mkdir logs
if exist tmp\ rmdir /S /Q tmp
mkdir tmp
for %%v in (
    I_chunks
    J_20101116
    K_dipstickAlone
    L_dipstick
    M_dipstickDeBruijn
    ) do (
    type "strLen_tests_stub.txt" "strLen%%v.txt" > "tmp\test_strLen%%v.cmd"
) 1>nul 2>nul

:: syntactic sugar :
set      "test= call ^"tmp\test_strLen^^!version^^!.cmd^" "
set   "do_test= "
set "skip_test= if 0==1 "


%do_test% (echo:&echo:&ver&echo:Unit test :
    set "version=I_chunks"          &%test% unittest
    set "version=J_20101116"        &%test% unittest
    set "version=K_dipstickAlone"   &%test% unittest
    set "version=L_dipstick"        &%test% unittest
    set "version=M_dipstickDeBruijn"&%test% unittest
) >con
::>"logs\strLen_unittest.log"
::>con


%skip_test% (echo:&echo:&ver&echo:Points of interest :
    set "version=I_chunks"          &%test% correctness    0_start 8_span
                                     %test% correctness 1020_start 8_span
                                     %test% correctness 7164_start 8_span
    set "version=J_20101116"        &%test% correctness    0_start 8_span
                                     %test% correctness 1020_start 8_span
                                     %test% correctness 7164_start 8_span
    set "version=K_dipstickAlone"   &%test% correctness    0_start 8_span
    set "version=L_dipstick"        &%test% correctness    0_start 8_span
                                     %test% correctness 1020_start 8_span
                                     %test% correctness 7164_start 8_span
    set "version=M_dipstickDeBruijn"&%test% correctness    0_start 8_span
                                     %test% correctness 1020_start 8_span
                                     %test% correctness 7164_start 8_span
) >con
::>"logs\strLen_correctness_partial.log"
::>con


%do_test% (echo:&echo:&ver&echo:Limits :
    set "version=I_chunks"          &%test% correctness 8183_start 8_span
    set "version=J_20101116"        &%test% correctness 8183_start 8_span
    set "version=K_dipstickAlone"   &%test% correctness  250_start 8_span
    set "version=L_dipstick"        &%test% correctness 8183_start 8_span
    set "version=M_dipstickDeBruijn"&%test% correctness 8183_start 8_span
) >con
::>"logs\strLen_correctness_limits.log"
::>con


:: with all versions, takes 5 good minutes at 1.7 GHz :
%skip_test% (echo:&echo:&ver&echo:Comprehensive testing :
    set "version=I_chunks"          &%test% correctness 0_start 8200_span
    set "version=J_20101116"        &%test% correctness 0_start 8200_span
    set "version=K_dipstickAlone"   &%test% correctness 0_start  260_span
    set "version=L_dipstick"        &%test% correctness 0_start 8200_span
    set "version=M_dipstickDeBruijn"&%test% correctness 0_start 8200_span
) >"logs\strLen_correctness_full_test.log"
::>"logs\strLen_correctness_full_test.log"
::>con


:: with all versions, takes 5 good minutes at 1.7 GHz :
%skip_test% (echo:&echo:&echo:&ver&echo:Speed comparisons :>con
    echo:&set "version=I_chunks"          &%test% speed    0_start  250_span 8_times
    rem "warm up" done.
                                           %test% speed    0_start  250_span 8_times
                                           %test% speed  250_start  250_span 8_times
                                           %test% speed  500_start  250_span 8_times
                                           %test% speed  750_start  250_span 8_times
                                           %test% speed 1000_start 1000_span 2_times
                                           %test% speed 2000_start 1000_span 2_times
                                           %test% speed 7000_start 1000_span 2_times
    echo:&set "version=J_20101116"        &%test% speed    0_start  250_span 8_times
                                           %test% speed  250_start  250_span 8_times
                                           %test% speed  500_start  250_span 8_times
                                           %test% speed  750_start  250_span 8_times
                                           %test% speed 1000_start 1000_span 2_times
                                           %test% speed 2000_start 1000_span 2_times
                                           %test% speed 7000_start 1000_span 2_times
    echo:&set "version=K_dipstickAlone"   &%test% speed    0_start  250_span 8_times
                                           %test% speed    0_start  250_span 8_times
                                           %test% speed    0_start  250_span 8_times
    echo:&set "version=L_dipstick"        &%test% speed    0_start  250_span 8_times
                                           %test% speed  250_start  250_span 8_times
                                           %test% speed  500_start  250_span 8_times
                                           %test% speed  750_start  250_span 8_times
                                           %test% speed 1000_start 1000_span 2_times
                                           %test% speed 2000_start 1000_span 2_times
                                           %test% speed 7000_start 1000_span 2_times
    echo:&set "version=M_dipstickDeBruijn"&%test% speed    0_start  250_span 8_times
                                           %test% speed  250_start  250_span 8_times
                                           %test% speed  500_start  250_span 8_times
                                           %test% speed  750_start  250_span 8_times
                                           %test% speed 1000_start 1000_span 2_times
                                           %test% speed 2000_start 1000_span 2_times
                                           %test% speed 7000_start 1000_span 2_times
    rem to check consistency, the first test is repeated :
    echo:&set "version=I_chunks"          &%test% speed    0_start  250_span 8_times
) >>"logs\strLen_speed_comparisons.log"
::>>"logs\strLen_speed_comparisons.log"
::>con


:: cleaning :
if exist tmp\ rmdir /S /Q tmp
echo:
echo:
echo:strLen tests : completed.
pause
ENDLOCAL
@echo on
@EXIT /b



strLenJ_20101116.txt

Code: Select all

:strLen string len -- returns the length of a string
::                 -- string [in]  - variable name containing the string being measured for length
::                 -- len    [out] - variable to be used to return the string length
:: Many thanks to 'sowgtsoi', but also 'jeb' and 'amel27' dostips forum users helped making this short and efficient
:$created 20081122 :$changed 20101116 :$categories StringOperation
:$source http://www.dostips.com
(   SETLOCAL ENABLEDELAYEDEXPANSION
    set "str=A!%~1!"&rem keep the A up front to ensure we get the length and not the upper bound
                     rem it also avoids trouble in case of empty string
    set "len=0"
    for /L %%A in (12,-1,0) do (
        set /a "len|=1<<%%A"
        for %%B in (!len!) do if "!str:~%%B,1!"=="" set /a "len&=~1<<%%A"
    )
)
( ENDLOCAL & REM RETURN VALUES
    IF "%~2" NEQ "" SET /a %~2=%len%
)
EXIT /b



strLenK_dipstickAlone.txt

Code: Select all

:strLen
( SETLOCAL ENABLEDELAYEDEXPANSION
    set tmp=!%~1!^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FFFFFFFFFFFFFFFFEEEEEEEEEEEEEEEEDDDDDDDDDDDDDDDDCCCCCCCCCCCCCCCC^
BBBBBBBBBBBBBBBBAAAAAAAAAAAAAAAA99999999999999998888888888888888^
7777777777777777666666666666666655555555555555554444444444444444^
3333333333333333222222222222222211111111111111110000000000000000
    set /a len=0x!tmp:~0x1FF,1!!tmp:~0xFF,1!
)
( ENDLOCAL & REM RETURN VALUES
    IF "%~2" NEQ "" SET "%~2=%len%"
)
EXIT /b



strLenL_dipstick.txt

Code: Select all

:: A binary search is complemented by the use of a dipstick to leverage the brutal efficiency of random access :
:strLen string len -- returns the length of a string
::                 -- string [in]  - variable name containing the string being measured for length
::                 -- len    [out] - variable to be used to return the string length
:: Many thanks to 'sowgtsoi', but also 'jeb' and 'amel27' dostips forum users helped making this short and efficient
:$created 20081122 :$changed 20101227 :$categories StringOperation
:$source http://www.dostips.com
(   SETLOCAL ENABLEDELAYEDEXPANSION
    set "str=A!%~1!"&rem keep the A up front to ensure we get the length and not the upper bound
                     rem it also avoids trouble in case of empty string
    set "len=0"
    for /L %%A in (12,-1,8) do (
        set /a "len|=1<<%%A"
        for %%B in (!len!) do if "!str:~%%B,1!"=="" set /a "len&=~1<<%%A"
    )
    for %%B in (!len!) do set str=!str:~%%B,-1!^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FFFFFFFFFFFFFFFFEEEEEEEEEEEEEEEEDDDDDDDDDDDDDDDDCCCCCCCCCCCCCCCC^
BBBBBBBBBBBBBBBBAAAAAAAAAAAAAAAA99999999999999998888888888888888^
7777777777777777666666666666666655555555555555554444444444444444^
3333333333333333222222222222222211111111111111110000000000000000
    set /a len+=0x!str:~0x1FF,1!!str:~0xFF,1!
)
( ENDLOCAL & REM RETURN VALUES
    IF "%~2" NEQ "" SET /a %~2=%len%
)
EXIT /b



strLenM_dipstickDeBruijn.txt

Code: Select all

:: Unfruitful attempt at using a smaller dipstick, a "B(16,2) De Bruijn sequence", crafted by
::   listing the 256 hexadecimal pairs in decreasing order, and by removing the redundancies.
:: Each sliding pair in the dipstick now appears once only and there remains 256 of them :
::   each can be seen as a unique encoding of its index.
:: To simplify the decoding, the used dipstick is rotated rightward once.
:: Here, the decoding computations do not manage to clearly beat random access, speedwise.
:strLen string len -- returns the length of a string
::                 -- string [in]  - variable name containing the string being measured for length
::                 -- len    [out] - variable to be used to return the string length
(   SETLOCAL ENABLEDELAYEDEXPANSION
    set "str=A!%~1!"&rem keep the A up front to ensure we get the length and not the upper bound
                     rem it also avoids trouble in case of empty string
    set "len=0"
    for /L %%A in (12,-1,8) do (
        set /a "len|=1<<%%A"
        for %%B in (!len!) do if "!str:~%%B,1!"=="" set /a "len&=~1<<%%A"
    )
    for %%B in (!len!) do set str=!str:~%%B,-1!^
0FFEFDFCFBFAF9F8F7F6F5F4F3F2F1F^
0EEDECEBEAE9E8E7E6E5E4E3E2E1E^
0DDCDBDAD9D8D7D6D5D4D3D2D1D^
0CCBCAC9C8C7C6C5C4C3C2C1C^
0BBAB9B8B7B6B5B4B3B2B1B^
0AA9A8A7A6A5A4A3A2A1A^
0998979695949392919^
08878685848382818^
077675747372717^
0665646362616^
05545352515^
044342414^
0332313^
02212^
011^
00
    set /a "bruijn=0x!str:~0xFF,2!, sixteenth=bruijn>>4, domain=bruijn-(sixteenth<<4|sixteenth)"
    if !domain! LSS 0 (
        set /a "len+=sixteenth*sixteenth+((0xF&bruijn)<<1)"
    ) else (
        set /a "abscissa=(0xF&bruijn)-1+^!sixteenth, len+=(sixteenth<<1)+abscissa*(abscissa+2)"
    )
)
( ENDLOCAL & REM RETURN VALUES
    IF "%~2" NEQ "" SET /a %~2=%len%
)
EXIT /b



Aaaand that should be all !

Edit 20110116 : replaced the word 'gauge' by the fitter : 'dipstick'.
Last edited by sowgtsoi on 16 Jan 2011 16:55, edited 1 time in total.

Post Reply