Sort tokens within a string & Disable FOR /F EOL option
Posted: 11 May 2011 15:06
2011-07-10: Changed title to reference discussion of FOR /F EOL option which begins in 3rd post on this thread
While working on a "universal" %DATE% parser it became necessary to sort tokens within a string. A few techniques were briefly bandied about that all relied on a fixed small number of tokens within the string. I thought it might be useful to have a generic function that can efficiently handle any number of tokens.
The first function I developed relies on a pipe to the SORT command, and is therefore case insensitive. I'm very happy that it does not require any explicit temporary files. (The SORT command can internally create a temporary file, but I doubt tokens within a single string could ever cause that to happen.) The performance is good, and it is virtually uneffected by the length of the string or the number of tokens.
Here are the sortStrTokensI test results:
My next attempt is a case sensitive version that does not rely on SORT or pipes. Performance is three times faster for a string with only 3 tokens. But performance dramatically suffers as the number of tokens grows. The function could be extended to support a case insensitive option, but I prefer the performance profile of the first function.
Here are the sortStrTokens test results:
Here is a comparison of the performance profile of the two functions
As always I'm interested if anyone can point out problems, optimizations, or entirely new solutions.
Dave Benham
While working on a "universal" %DATE% parser it became necessary to sort tokens within a string. A few techniques were briefly bandied about that all relied on a fixed small number of tokens within the string. I thought it might be useful to have a generic function that can efficiently handle any number of tokens.
The first function I developed relies on a pipe to the SORT command, and is therefore case insensitive. I'm very happy that it does not require any explicit temporary files. (The SORT command can internally create a temporary file, but I doubt tokens within a single string could ever cause that to happen.) The performance is good, and it is virtually uneffected by the length of the string or the number of tokens.
Code: Select all
@echo off
setlocal
set "str=red blue yellow white black grey green purple orange"
echo: unsorted str = %str%
call :sortStrTokensI str
echo: ascending str = %str%
call :sortStrTokensI str /r
echo:descending str = %str%
echo:
set str=I i e E A a À Á Â Ã Ä Å Æ È É Ê Ë Ì Í Î Ï à á â ã ä å æ è é ê ë ì í î ï
echo: unsorted = %str%
call :sortStrTokensI str
echo: native sort = %str%
set str=I i e E A a À Á Â Ã Ä Å Æ È É Ê Ë Ì Í Î Ï à á â ã ä å æ è é ê ë ì í î ï
call :sortStrTokensI str "/l C"
echo:^>=128 binary sort = %str%
exit /b
:sortStrTokensI StrVar ["sort options"]
::
:: Perform a case insensitive sort of tokens within the string contained
:: by variable StrVar.
::
:: By default the tokens are sorted using the local collating sequence
:: in ascending order. All sorts are case insensitive.
::
:: The following sort options can over-ride default behaviour
::
:: /R Specifies a descending sort.
::
:: "/L C" Characters greater than ASCII 127 are sorted according to
:: their binary encoding.
::
:: Multiple options should be enclosed by a single pair of quotes
::
:: This function does not properly handle tokens containing * or ?
::
setlocal enableDelayedExpansion
set "str=!%~1!"
set "sorted="
for /f %%a in ('^(for %%t in ^(!str!^) do @echo %%t^)^|sort %~2') do set "sorted=!sorted! %%a"
(endlocal
set "%~1=%sorted:~1%"
)
exit /b
Here are the sortStrTokensI test results:
Code: Select all
unsorted str = red blue yellow white black grey green purple orange
ascending str = black blue green grey orange purple red white yellow
descending str = yellow white red purple orange grey green blue black
unsorted = I i e E A a À Á Â Ã Ä Å Æ È É Ê Ë Ì Í Î Ï à á â ã ä å æ è é ê ë ì í î ï
native sort = ï Ä Í É À È Ã Æ Ì Â Ë Á Ï Ê Å Î æ ì a A E e I i á à â ë î é ã ä å è í ê
>=128 binary sort = A a E e i I À Á Â Ã Ä Å Æ È É Ê Ë Ì Í Î Ï à á â ã ä å æ è é ê ë ì í î ï
My next attempt is a case sensitive version that does not rely on SORT or pipes. Performance is three times faster for a string with only 3 tokens. But performance dramatically suffers as the number of tokens grows. The function could be extended to support a case insensitive option, but I prefer the performance profile of the first function.
Code: Select all
@echo off
setlocal
set "str=red blue yellow white black grey green purple orange"
echo: unsorted str = %str%
call :sortStrTokens str
echo: ascending str = %str%
call :sortStrTokens str /r
echo:descending str = %str%
echo:
set str=I i e E A a À Á Â Ã Ä Å Æ È É Ê Ë Ì Í Î Ï à á â ã ä å æ è é ê ë ì í î ï
echo: unsorted str = %str%
call :sortStrTokens str
echo: ascending str = %str%
call :sortStrTokens str /r
echo:descending str = %str%
exit /b
:sortStrTokens StrVar [/R]
::
:: Perform a case sensitive sort of tokens within the string contained
:: by variable StrVar.
::
:: By default the tokens are sorted using the local collating sequence
:: in ascending order.
::
:: The case insenstive /R option specifies a descending sort
::
:: This function does not properly handle tokens containing * or ?
::
setlocal enableDelayedExpansion
set "str=!%~1!"
set "sorted="
if /i "%~2"=="/R" (set comp=geq) else set comp=leq
for %%t in (!str!) do (
if not defined sorted (set "sorted=%%t") else (
set "sorted2="
set placed=
for %%a in (!sorted!) do (
if not defined placed if %%t %comp% %%a (
set "sorted2=!sorted2! %%t"
set placed=true
)
set "sorted2=!sorted2! %%a"
)
if not defined placed set "sorted2=!sorted2! %%t"
set "sorted=!sorted2:~1!"
)
)
(endlocal
set "%~1=%sorted%"
)
exit /b
Here are the sortStrTokens test results:
Code: Select all
unsorted str = red blue yellow white black grey green purple orange
ascending str = black blue green grey orange purple red white yellow
descending str = yellow white red purple orange grey green blue black
unsorted str = I i e E A a À Á Â Ã Ä Å Æ È É Ê Ë Ì Í Î Ï à á â ã ä å æ è é ê ë ì í î ï
ascending str = ï Ä Í É À È Ã Æ Ì Â Ë Á Ï Ê Å Î æ ì a A e E i I á à â ë î é ã å ä í è ê
descending str = ê è í ä å ã é î ë â à á I i E e A a ì æ Î Å Ê Ï Á Ë Â Ì Æ Ã È À É Í Ä ï
Here is a comparison of the performance profile of the two functions
Code: Select all
Token Seconds to Perform 100 Iterations
Count sortStrTokensI sortStrTokens
----- -------------- -------------
3 2.7 0.8
10 2.9 1.4
20 2.7 2.4
30 2.8 4.1
50 2.9 9.6
As always I'm interested if anyone can point out problems, optimizations, or entirely new solutions.
Dave Benham