Page 1 of 1

FINDSTR "out of memory" and "cannot open" files

Posted: 28 Sep 2022 13:49
by Ken852
Dear DOS experts,

I'm writing these lines in hope that someone here will pick up on this and help me understand what's going on with this FINDSTR command. I wrote a more detailed post about this on Reddit. While I did get a few upvotes, no one took time or interest in replying to me. Maybe no one knows the answer?

Basically, I have a file and folder structure that looks like this.

Code: Select all

C:.
└───Main Folder
    ├───Folder 1
    │   ├───Folder 1
    │   │       file 1.txt
    │   │       file 2.txt
    │   │       file 3.txt
    │   │       
    │   ├───Folder 2
    │   │       file 1.txt
    │   │       file 2.txt
    │   │       file 3.txt
    │   │       
    │   └───Folder 3
    │           file 1.txt
    │           file 2.txt
    │           file 3.txt
    │           
    └───Folder 2
        ├───Folder 1
        │       file 1.txt
        │       file 2.txt
        │       file 3.txt
        │       
        ├───Folder 2
        │       file 1.txt
        │       file 2.txt
        │       file 3.txt
        │       
        └───Folder 3
                file 1.txt
                file 2.txt
                file 3.txt
It's two sets of folders, each containing a few thousand subfolders, and in each of those are a handful of text files.

Starting from the root folder (Main Folder in this example), I am trying to use FINDSTR with the /S option to recursively search all the files within this tree structure for a specific string. But it fails with "out of memory" after 17 "cannot open".

Code: Select all

C:\Users\Ken\Desktop\DataMigration\Merge>findstr.exe /s "Project.45" *.txt
FINDSTR: Cannot open ParentFolder 1\Adam\Adams Folder\file.txt
FINDSTR: Cannot open ParentFolder 1\Ben\Bens Folder\file.txt
FINDSTR: Cannot open ParentFolder 1\Charlie\Charlies Folder\file.txt
FINDSTR: Cannot open ParentFolder 2\David\Davids Folder\file.txt
FINDSTR: Cannot open ParentFolder 2\Eric\Erics Folder\file.txt
FINDSTR: Cannot open ParentFolder 2\Freddie\Freddies Folder\file.txt
...
FINDSTR: Out of memory
However if I point it directly at any one file of interest, it does find what I'm looking for. So it's not that the files don't exist. It seems to be more of a problem with traversing the folder structure.

I have already found a PowerShell alternative to this that does what I want. But I was curious why FINDSTR is failing? Both folder names and file name consist of only English alphabet characters, dashes, dots, parentheses and square brackets. Could this be the offending factor?

I did find the Q&A style post of "biblical proportions" by one of your regular users on this forum, and it implicates /S option as being problematic. Although I don't quite understand how? Can someone give me an example with my use case in mind? Can I not use this option for "matching files in the current directory and all subdirectories" like it says in the help section? Or does this mean something special and not what I expect?

Also, how do you run out of memory running such a command? I speculate it might be because I am running this on a large number of files (even though it's finding none!). But I have 32 GB of RAM and only half of that is in use. I'm not sure if this is what "memory" means in this context, but I have plenty of it.

Help me DOS experts, you're my only hope.

Regards,
Ken

Re: FINDSTR "out of memory" and "cannot open" files

Posted: 29 Sep 2022 00:01
by miskox
"Out of memory" would probably mean that findstr.exe does not close all the handles it opens (I would say that this is a bug in FINDSTR - I had a similars situation a week or two ago with FINDSTR on Windows 10 - received Out of memory error - but there was no /S switch. I don't rememebr what exactly I was doing - so can't reproduce).

From what I see you are *not* interested if *contents* of the files are the same, right? Maybe you could just use ROBOCOPY instead?

Edit: after some thinking: maybe there are too many open handles?

Saso

Re: FINDSTR "out of memory" and "cannot open" files

Posted: 29 Sep 2022 06:36
by Ken852
Hello Saso! Thanks for replying.

I am really a noob when it comes to commands and stuff, but I am somewhat familiar with the term "handle". It's like a number of memory allocations and stuff a process gets? I think you're onto something here. If this is the cause for "out of memory", how do you get around it?

What I really wanted to do was very basic - or so I thought. I just wanted to do a string search within all the text files, within all of the folders below where I was at (the working or current directory), and just have it report back if a match was found and where (in what file).

This is the command I used most recently:

Code: Select all

FINDSTR /S /I "Project.45" *.txt
Like I said, I already found a way to use PowerShell to do what I want. See this command for comparison:

Code: Select all

Get-ChildItem -Path .\*.txt -Recurse | Select-String -Pattern 'Project.45'
Are these two commands not equivalent?

I also noticed that I was able to do a search like this:

Code: Select all

FINDSTR /S /I "Project" *.txt
So the command appears to work if I leave out ".45" in the search string.

Previously, I was also able to list the directories with DIR and pipe that to FINDSTR, and it returned the same matches as PS did, with exception for some files with odd characters in their name where it reported "cannot open". But I didn't run into "out of memory" at that time. This is what made me dodge FINDSTR altogether and look for alternative, because it appeared unreliable. Later I remembered why it had worked the first time and not the second time and other times. I had moved all the files or "end nodes" as I think it's also called, from their respective folders to the root folder or working directory, or just one step below - not sure exactly what level it was now, I just remember I had "unfolded" all the files from their folders.

I think the command was something like this:

Code: Select all

DIR /B /S *.txt | FINDSTR /I "Project.45" *
I have now tried to replicate all of this, in other words "unfolding" the files one level up, and I got the same results I had seen previously. There was no "out of memory" error with this more "flat" folder structure. The folder structure I started out with was not very deep either, but there are a few thousand of them, and I think that's problematic.

The "cannot open" errors in this test appeared only three times, two of them appear to be related to two files whose names include em dash characters and French characters like E with circumflex accent (Ê), E with grave accent (È) and few other Latin letter variants.

Interestingly, and also worth mentioning, there was one file with Cyrillic letters in and an em dash, but it did not throw an error (rest of the file was using Latin/English letters). So I was not entirely right when I said all the files used English only characters. You know, it's not easy to enforce that users use a specific characters when you're working with a large multi-national team of users.

The Ê was replaced with ^ in the error output, and the È was replaced with <.

The third "cannot open" error in this test was not related to any odd characters in the file name that FINDSTR could have had same type of issue with. The third file only used regular ASCII characters, i.e. English letters, regular dash, numbers, dots and parentheses. The problem with this one appears to be related to folder structure, because unlike with the other two files, I failed to "unfold" this file one level up so it was kept in its original location just like in the other tests - where I encountered the 17 consecutive "cannot open" and one "out of memory" error.

All in all, I think this just shows how unreliable FINDSTR is. The "cannot open" errors can be caused both by deep folder structure and by characters characters used in the file names - either one or both will cause this error. That's all I can conclude from all this, with my limited understanding of command lines. This stuff can drive a person insane. I think. This is proper madness.

Regards,
Ken

Re: FINDSTR "out of memory" and "cannot open" files

Posted: 29 Sep 2022 08:50
by Aacini
I suggest you to use findstr /L switch this way:

Code: Select all

FINDSTR /L /S /I "Project.45" *.txt
... so findstr don't process the point as a regular expression character. Perhaps this sole point may help to avoid the error. Also, I suggest you to eliminate the /I switch if you really don't need it...

Antonio

Re: FINDSTR "out of memory" and "cannot open" files

Posted: 29 Sep 2022 23:36
by miskox
Good catch Aacini! I never expected that /L would be required because help shows:

Code: Select all

  /L         Uses search strings literally.
  /R         Uses search strings as regular expressions.
So I assumed that /L would be the default. But after some thinking help does not mention that it is a default. So /R could also be a default. Again, Aacini - very good catch.

Anyway I think that this memory leak (caused by regular expression search and not by too many open files because with /L it works) is for Microsoft to solve. I guess that findstr.exe allocates memory when doing search and does not release it after. One of the possibilites to test this would be (of course if a termination of an .exe would free the memory): use FOR to process each file
individually (with a new FINDSTR.EXE call each time (for each file)).

Saso

Re: FINDSTR "out of memory" and "cannot open" files

Posted: 02 Oct 2022 13:22
by Ken852
Thank you both for showing interest in this problem.

Sadly I was unable to make it work with FINDSTR, not with the /L option nor with the /R option.

I made a fresh copy of all the files for this test and this time around I got 22 "cannot open" errors (compared to 17 previously) before the "out of memory" appeared. I was monitoring RAM, and strangely, I noticed that the RAM was going down (using less and less) as FINDSTR was spitting out "cannot open" errors (before it ended at "out of memory").

Interestingly, I tried going down one folder level and then using "..\*.txt" to search "above my own head" so to speak and work my way down. This made it spit out more "cannot open" errors than I could count. Hundreds of them. I had to abort with Ctrl+C.

I'm not sure whether FINDSTR defaults to using RegEx. What I do know - by painful experience - is that it defaults to being case-sensitive. So yes, I do need the /I option to make a case-insensitive search. This is unlike Select-String in PowerShell, which defaults to being case-insensitive and you have to tell it to be case sensitive if this is what you want. This way, Select-String returns more results than FINDSTR with less effort.

I tried to do literal search like this:

Code: Select all

/L "Project.45"
As well as explicit RegEx search with escape like this:

Code: Select all

/R "Project\.45"
None of these worked in my case.

Does FINDSTR differentiate between double quotes ("") and single ('') quotes in search parameters? How about DOS or CMD in general? Like in a shell script on a Unix-like system?

Nonetheless, I tried both. It didn't help my case.

Code: Select all

/L "Project.45"
/L 'Project.45'
/R "Project\.45"
/R 'Project\.45'
Another interesting oddity is that I was able to remove one of the errors in my previous test with three errors. As I suspected, this was related to characters used in file names. But for two of the three errors, it was not the French or the Cyrillic characters that were causing it, it was rather the em dash (U+2014, —). By removing them from affected files and then re-running FINDSTR it did not spit out those two errors for those two files.

The reason I saw those French characters is because they were substituted for caret (^) and less than (<) characters only in the command output. They were not the cause for FINDSTR not being able to read these files. The cause was the em dash. After removing em dash in both file names, the files became readable.

The third file was a different matter altogether. This one had no odd characters in its name, none that would cause an error from what I can tell. The problem here appeared to be that this file was one or two folder level too deep. So I made the related error go away simply by moving that file one or two levels up the folder structure, and once I ran FINDSTR again, that error was gone as well. There were no more errors at that point. But the cost of arriving at that point was my sanity – i.e. too expensive.

Again, thank you both for showing interest and helping me with this situation. This was a good exercise I think, but it was a bit too much for me to process. To tell you the truth, this was the very first time for me to FINDSTR – it may as well be the last.

Re: FINDSTR "out of memory" and "cannot open" files

Posted: 03 Oct 2022 00:03
by miskox
@Ken852: very good findings. I really think that this should be reported to Microsoft (with all the steps required if there are any additional to what you wrote to reproduce these errors).
I really don't think that FINDSTR should ran out of memory. I am using findstr all the time (XP and Win 10) without any problems (but I really just search one (maybe more) file(s) in *current* folder but I might make many output files (viewtopic.php?t=5495 - it turned out that ESET had a memory leak)).

Did you try to use FOR /R so you execute new FINDSTR search on *one* file only?

Code: Select all

FOR /R "." %%f in (*.txt) do echo %%f

Code: Select all

REM /s removed below
FOR /R "." %%f in (*.txt) do findstr /L /i "Project.45" %%f
Maybe you could add /C (and then I guess /L could be removed)?

Code: Select all

FOR /R "." %%f in (*.txt) do findstr /i /C:"Project.45" %%f
What about your original search command:

Code: Select all

findstr.exe /i /s /C:"Project.45" *.txt
Instead of /L /C is used.

Saso