Memory leak when reading large text files

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
vin97
Posts: 35
Joined: 17 Apr 2020 08:30

Memory leak when reading large text files

#1 Post by vin97 » 29 Jul 2020 06:16

I get a bad buffer overflow when reading large text files (3+ MB) through a FOR loop.
Looking at the task manager, the memory usage keeps rising and rising up the point of system crash. The strange thing is that no single process ever shows up with more than 50MB in the process list at any time.
Also, when stopping the program and trying to clear memory manually, it only barely does so, clearing up maybe 200MB or so.

The only thing done is counting the number of lines containing the string ":\".

Code: Select all

set /a cnt=0
for /f "delims=" %%i in (list.txt) do (
	set "line=%%i"
	echo "!line!" | find /i ":\" > nul
	if not !errorlevel!==1 (set /a cnt=!cnt!+1)
)
Only reading one line at a time gives the same result:

Code: Select all

set /a cnt=0
set /a k=0
:Loop
set /a k=%k%+1
for /f "skip=%k% delims=" %%i in (list.txt) do (set "line=%%i" & goto Loop2)
:Loop2
if "%line%"=="end line" (goto end)
echo "%line%" | find /i ":\" > nul
if not %errorlevel%==1 (set /a cnt=%cnt%+1)
goto Loop

My question is how do I successfully clear up memory in this case?
My plan was to constantly check on how much memory is available. When a certain threshold is reached, the loop is terminated and all progress is written to a tempory file. Memory would then be cleared and a new loop would be started.

Squashman
Expert
Posts: 4486
Joined: 23 Dec 2011 13:59

Re: Memory leak when reading large text files

#2 Post by Squashman » 29 Jul 2020 08:51

The FOR command reads the ENTIRE file into memory before it executes any of the DO block.

vin97
Posts: 35
Joined: 17 Apr 2020 08:30

Re: Memory leak when reading large text files

#3 Post by vin97 » 29 Jul 2020 08:57

Strange because memory usage keeps steadily rising at a similar pace while the loop is running and stops as soon as the program is done.
It takes a few minutes to scan the text file and builds 2+GB memory usage in the process.

Eureka!
Posts: 137
Joined: 25 Jul 2019 18:25

Re: Memory leak when reading large text files

#4 Post by Eureka! » 29 Jul 2020 12:40

vin97 wrote:
29 Jul 2020 06:16
The only thing done is counting the number of lines containing the string ":\".
In that case, you might "get away" with using find /c instead:

Code: Select all

for /f "usebackq tokens=2 delims=:"  %%x in (`find /c /i ":\" "list.txt" `)  Do echo COUNT=%%x
(the /i is not needed in this case, but was added on auto-pilot :-) )

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Memory leak when reading large text files

#5 Post by dbenham » 29 Jul 2020 17:05

Here is a very fast solution that shouldn't have any memory issues. I use FINDSTR to do all the heavy lifting.

The first FINDSTR only preserves lines that contain ":\" and lines that equal "end line". The odd syntax is due to FINDSTR peculiarities when dealing with \ and " characters.
That result is piped to a 2nd FINDSTR that only preserves lines that equal "end line", and it prefixes each line with the line number followed by a colon.

The FOR /F only reads the "end line" lines, which should be a small number, probably only 1.

The desired count is simply the position of the first "end line" minus one. GOTO breaks out after finding the first "end line"

Code: Select all

@echo off
for /f "delims=:" %%N in (
  'findstr /rc:"^end line$" /c:":[\\]" "list.txt" ^| findstr /nxc:"end line"'
) do set /a cnt=%%N-1 & goto :done
:done
echo cnt=%cnt%

Dave Benham

vin97
Posts: 35
Joined: 17 Apr 2020 08:30

Re: Memory leak when reading large text files

#6 Post by vin97 » 30 Jul 2020 04:30

Thanks for those snippets but the FOR loop was just an example to show that even this simple version is creating memory issues.
In the actual program, there is a lot of other stuff that needs to happen inside the loop.

Nobody got an idea where those leaks are coming from in the first place or how the memory could be cleared properly?

Now that I think about it, I had similar issues with another batch file. In that case there was not a whole lot happening in the loop either but it produced the same memory problems when letting it run for long enough. The only thing the batch files have in common would be the looped FIND command.

miskox
Posts: 630
Joined: 28 Jun 2010 03:46

Re: Memory leak when reading large text files

#7 Post by miskox » 30 Jul 2020 07:00

Maybe this is the reason viewtopic.php?f=3&t=5495&p=34663

You actually create lots of files...

I had problems with the nonpaged pool...

Saso

vin97
Posts: 35
Joined: 17 Apr 2020 08:30

Re: Memory leak when reading large text files

#8 Post by vin97 » 31 Jul 2020 05:46

That seems to be it.
Problem now is, there doesn't seem to be a way to clear the NonPaged Pool without rebooting.
Maybe I'm missing something, though?

penpen
Expert
Posts: 2009
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Memory leak when reading large text files

#9 Post by penpen » 31 Jul 2020 10:58

You could report that bug to your virus scanner producer.
Also you should avod using piping massively (like Eureka! and Dave did); another option to avoid the following piping

Code: Select all

echo "!line!" | find /i ":\" > nul
	if not !errorlevel!==1 (set /a cnt=!cnt!+1)
would be to use something like the following (untested):

Code: Select all

	if "!line!" neq "!line:*:\=!" set /a cnt=!cnt!+1
Depending on your code you might have more options.


penpen

vin97
Posts: 35
Joined: 17 Apr 2020 08:30

Re: Memory leak when reading large text files

#10 Post by vin97 » 01 Aug 2020 06:04

Thanks for the tips!

I think the leak is coming from some driver. Deactivating the antivirus gives the same result.
So there is really no way to clear the nonpaged pool manually?

miskox
Posts: 630
Joined: 28 Jun 2010 03:46

Re: Memory leak when reading large text files

#11 Post by miskox » 01 Aug 2020 11:17

No. You can't free Nonpaged pool - only with a reboot. That is why my .bat has a stop when it is too close for the system to become unusable.

You have to find which driver it is. I used POOLMON.EXE...

You can do the test in a virtual machine with a clean Windows install. If there will be no problems then you know where to start.

Hope this helps.
Saso

vin97
Posts: 35
Joined: 17 Apr 2020 08:30

Re: Memory leak when reading large text files

#12 Post by vin97 » 01 Aug 2020 11:58

Ok.

Is it only the nonpaged pool that can cause a system crash or does this also happen when memory is low in general? In other words, is it enough to only monitor the nonpaged pool in the batch file?

What kind of memory % do you suggest for stopping the program?

miskox
Posts: 630
Joined: 28 Jun 2010 03:46

Re: Memory leak when reading large text files

#13 Post by miskox » 01 Aug 2020 12:17

Memory in nonpaged pool will remain allocated until it is released by the creator. So this is causing problems if the process that allocated the memory does not free it.

Regarding the %? I really don't know. Be on a safe side (as you can see I was doing my research on XP in 2014 - so you probably have Windows 10. There might be different tools to investigate this). You can do tests now and see when system starts to behave strange.

FInd the .dll that is causing Nonpaged pool to exaust. And then you can contact the manufacturer about this.

I might be wrong about this: Paged pool is stored in RAM but can be paged to disk (pagefile) and pagefile can be increased if required - it might slow down the system durign this process. So I would think that this cannot cause any problems because 'memory' can be moved to disk (pagefile) if necessary.

Are you sure that antivirus is not causing the problems? Which one are you using?

Saso

vin97
Posts: 35
Joined: 17 Apr 2020 08:30

Re: Memory leak when reading large text files

#14 Post by vin97 » 01 Aug 2020 13:21

Just Windows defender.

penpen
Expert
Posts: 2009
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Memory leak when reading large text files

#15 Post by penpen » 01 Aug 2020 18:51

Just in case it is related to the amount of processes you create by using piping (although the behavour on my win xp 64 was different), then access to some special processor tables (for example Local Descriptor Table, Global DT, ...) could be messed up (fractured table memory, Deadlock, ...). But i guess you need ~40,000 processes minimum to reach that state on modern computers (no need to say i haven't checked that).
I once have programmed a batch to find Tangle solutions (with ideas shared between trebor68 and me), which massively uses findstr processes; the last version does not create enough many processes on its own, but if you start 10-20 instances it should cause that issue (at least on my old machine - you might need more on a modern computer), so you could check ifthat results in the same behaviour; see:
viewtopic.php?f=3&t=6407&p=41200&hilit=tangle#p41180

If that's the case, then a hint were that those issues should be self detected and solved by windows after you left your system alone (don't start any process don't move the mouse, end non system processes that use hdd or software interrupts) for 10-60 minutes.
The only thing you could do then is: Redesign your algorithm to avoid starting and ending that amount of processes in that short time.


penpen

Post Reply