Performance Issues with Code

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Performance Issues with Code

#16 Post by Aacini » 05 Aug 2020 10:34

Eureka! wrote:
04 Aug 2020 16:25

Inspired by @Aacini's solution, some pseudo-code as I dont have the time and experience to convert this to proper code:

Code: Select all

Instead of month n=1..12, set month= 2^^n -1 (1.. 4095)
set /a min=4095, max=0
For loop:
  set /a min="min & month", max="max | month"

After the for-loop, convert 2^^n - 1 back to n.
 
Might be faster ..
This is a good idea! :D

Code: Select all

@echo off
setlocal EnableDelayedExpansion

rem Empty environment
(
   for /F "delims==" %%a in ('set') do set "%%a="
   set "ComSpec=%ComSpec%"
)

rem Define the month elements with a bit set in the month position, that is:
rem Bits: 12 11 10  9  8  7  6  5  4  3  2  1   Value:
rem Jan:   0  0  0  0  0  0  0  0  0  0  0  1 = 2
rem Feb:   0  0  0  0  0  0  0  0  0  0  1  0 = 4
rem ...
rem Dec:   1  0  0  0  0  0  0  0  0  0  0  0 = 4096

set /A "i=0, j=100"
for %%a in (January February March April May June July August September October November December) do (
   set /A "i+=1, j+=1, bit=1<<i"
   set "month=%%a"
   set /A "m%%a=bit, m!month:~0,3!=bit, m!i!=bit, m!j:~1!=bit"
)

SET "zBits=0"
FOR /F "skip=1 USEBACKQ tokens=2 delims=|" %%a in ("test.txt") DO set /A "zBits|=m%%~a"

rem Convert accumulated bit positions back to months numbers: first bit for Min, last bit for Max
set "MINM="
for /L %%b in (1,1,12) do (
   set /A "bit=1<<%%b & zBits"
   if !bit! neq 0 (
      set "MAXM=%%b"
      if not defined MINM set "MINM=%%b"
   )
)

if %MAXM% lss 10 set "MAXM=0%MAXM%"
if %MINM% lss 10 set "MINM=0%MINM%"

echo Min: %MINM%
echo Max: %MAXM%
Antonio

PS - Please post the timing of my two versions...

Eureka!
Posts: 137
Joined: 25 Jul 2019 18:25

Re: Performance Issues with Code

#17 Post by Eureka! » 05 Aug 2020 12:56

ShadowThief wrote:
04 Aug 2020 17:32
I'm getting between 60 and 80 seconds for a million rows for this, likely because two set statements are being run every single iteration.
Thanks for testing (and writing the code, of course)!

Lesson learned: Comparing variables is "cheaper" than setting variables. Good to know!
Aacini wrote:
05 Aug 2020 10:34

Code: Select all

[...]
Clever!! :thumbsup:

ShadowThief
Expert
Posts: 1166
Joined: 06 Sep 2013 21:28
Location: Virginia, United States

Re: Performance Issues with Code

#18 Post by ShadowThief » 05 Aug 2020 13:21

Aacini wrote:
05 Aug 2020 10:34
PS - Please post the timing of my two versions...
I ran both versions ten consecutive times with the same million-line data file.

Version 1:

Code: Select all

Start:    14:53:15.67
End:      14:54:17.30
Duration: 61.6 seconds

Start:    14:54:17.31
End:      14:55:19.49
Duration: 62.2 seconds

Start:    14:55:19.49
End:      14:56:21.07
Duration: 61.6 seconds

Start:    14:56:21.07
End:      14:57:22.52
Duration: 61.5 seconds

Start:    14:57:22.53
End:      14:58:24.05
Duration: 61.5 seconds

Start:    14:58:24.05
End:      14:59:25.51
Duration: 61.5 seconds

Start:    14:59:25.51
End:      15:00:27.02
Duration: 61.5 seconds

Start:    15:00:27.02
End:      15:01:29.54
Duration: 62.5 seconds

Start:    15:01:29.54
End:      15:02:37.40
Duration: 67.9 seconds

Start:    15:02:37.41
End:      15:03:41.34
Duration: 63.9 seconds

Average Duration: 62.6 seconds
Version 2:

Code: Select all

Start:    14:37:13.79
End:      14:37:47.59
Duration: 33.8 seconds

Start:    14:37:47.59
End:      14:38:21.38
Duration: 33.8 seconds

Start:    14:38:21.38
End:      14:38:55.22
Duration: 33.8 seconds

Start:    14:38:55.22
End:      14:39:30.02
Duration: 34.8 seconds

Start:    14:39:30.03
End:      14:40:05.44
Duration: 35.4 seconds

Start:    14:40:05.45
End:      14:40:40.72
Duration: 35.3 seconds

Start:    14:40:40.72
End:      14:41:15.61
Duration: 34.9 seconds

Start:    14:41:15.61
End:      14:41:50.93
Duration: 35.3 seconds

Start:    14:41:50.94
End:      14:42:25.61
Duration: 34.7 seconds

Start:    14:42:25.61
End:      14:43:01.46
Duration: 35.8 seconds

Average Duration: 34.8 seconds

SIMMS7400
Posts: 546
Joined: 07 Jan 2016 07:47

Re: Performance Issues with Code

#19 Post by SIMMS7400 » 10 Aug 2020 04:10

WOW! I'm going to play around with these solutions and see what works best.

It looks like Shadow's and Atonio's solution are comparable? I forgot how fast Shadow's code ran so i will need to do some runs times...

SIMMS7400
Posts: 546
Joined: 07 Jan 2016 07:47

Re: Performance Issues with Code

#20 Post by SIMMS7400 » 10 Aug 2020 17:35

Hello Gents -

With a file of about 700k lines, Antonio's solution is about 30 seconds faster. Pretty significant difference, but both are still unbelieveavly quick.

Thank you again to you both for these solutions!!

ShadowThief
Expert
Posts: 1166
Joined: 06 Sep 2013 21:28
Location: Virginia, United States

Re: Performance Issues with Code

#21 Post by ShadowThief » 10 Aug 2020 17:43

I'd love to see the times you ended up with, because all of my tests have my code run about 10 seconds faster than anything else that's been posted so far.

SIMMS7400
Posts: 546
Joined: 07 Jan 2016 07:47

Re: Performance Issues with Code

#22 Post by SIMMS7400 » 10 Aug 2020 18:01

Here are the results:

Code: Select all

Atonio's Solution 
Start Time : 19:54:44.36
End Time : 19:56:33.82
Shadow's Solution 
Start Time : 19:57:22.91
End Time : 19:58:54.68
File Size : 797761 lines

I stand corrected, Shadow's solution is quicker! My apologies for the incorrect message earlier, I was trying to time things "off the hip" on the fly.

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Performance Issues with Code

#23 Post by Aacini » 12 Aug 2020 05:27

Which ShadowThief's solution is faster than which Antonio's solution? I wrote two solutions. The main loop in the second one is this:

Code: Select all

FOR /F "skip=1 USEBACKQ tokens=2 delims=|" %%a in ("test.txt") DO set /A "zBits|=m%%~a"
It is hard to me to belive that such a loop is slower than this one:

Code: Select all

FOR /F "skip=1 usebackq tokens=2 delims=|" %%a IN ("test.txt") DO ( 
    IF !month_val[%%~a]! GTR !MAXM! SET "MAXM=!month_val[%%~a]!"
    IF !month_val[%%~a]! LSS !MINM! SET "MINM=!month_val[%%~a]!"
)
Could you test my second code changing zBits with bits please? Thanks...

Antonio

Post Reply