Unable to understand how to use findstr...

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
Bucko
Posts: 14
Joined: 05 Jun 2018 01:01

Unable to understand how to use findstr...

#1 Post by Bucko » 09 Jun 2018 08:00

My mental abilities are not enough :wink: to understand how findstr should be used to extract a part of a text file, more precisely all lines from line number n to line number m, from a text file into an output file.

Could someone here help me?

Please note that my question is limited to findstr, because such a solution could be used for binary files too (which are my real target).

Thank you in advance.

ShadowThief
Expert
Posts: 1166
Joined: 06 Sep 2013 21:28
Location: Virginia, United States

Re: Unable to understand how to use findstr...

#2 Post by ShadowThief » 09 Jun 2018 09:45

Binary files don't necessarily have newline characters, so extracting lines M through N may not always be possible. Also, what gave you the idea that findstr was going to be useful for this?

You can, however use a for loop.

Code: Select all

@echo off
setlocal enabledelayedexpansion

:: Specify which line to start returning from
set get_line=7

:: Specify how many lines to return
set return_lines=5
:: In this example, lines 7 through 12 will be returned

:: Trial and error made me add this line. Don't touch it.
set /a get_line-=2

(
	for /L %%A in (0,1,%get_line%) do set /p skip_line=
	for /L %%A in (0,1,%return_count%) do (
		set /p print_line=
		echo !print_line!
	)
) <file.txt

Bucko
Posts: 14
Joined: 05 Jun 2018 01:01

Re: Unable to understand how to use findstr...

#3 Post by Bucko » 09 Jun 2018 12:22

ShadowThief, thank you very much for your support!

You address questions which are not directly relevant to my setting. All my binary files (which I intend to use) have "lines" (i.e. newline characters), and these "lines" are not to long for findstr (there is a length limit). So findstr definitely would work with my binary data. Therefore findstr would be useful for me––why do you wonder so much about this idea?

Your script works fine with text files. (I corrected the variable name in the second for and added >> output.txt to echo.) One question: At the end of the last line a newline is added (in output.txt, a blank 6th line). Do you have an idea how this could be avoided?

But, of course, your script doesn't work with binary files (00!). That's why I ask how to do the same using findstr...

ShadowThief
Expert
Posts: 1166
Joined: 06 Sep 2013 21:28
Location: Virginia, United States

Re: Unable to understand how to use findstr...

#4 Post by ShadowThief » 09 Jun 2018 12:47

If your binary files have newlines, you can probably stick a

Code: Select all

type binary_file.exe >file.txt
at the top of the script and see if that works.

Bucko
Posts: 14
Joined: 05 Jun 2018 01:01

Re: Unable to understand how to use findstr...

#5 Post by Bucko » 09 Jun 2018 13:16

I tested this, yes, and it doesn't work, as I said (nul character is ignored). This discussion only removes us from my question from the beginning.

misol101
Posts: 475
Joined: 02 May 2016 18:20

Re: Unable to understand how to use findstr...

#6 Post by misol101 » 09 Jun 2018 15:10

Does it absolutely *have* to be findstr? If so, then I can't help you, but if not, then this is precisely the kind of task I would use some quick-and-dirty c code for:

Code: Select all

#include <stdio.h>

int main(int argc, char **argv) {	
	FILE *fp, *ofp;
	long si,ei;
		
	if (argc <= 3) { puts("Usage: cpbinl file startline endline"); return 1; }
	si=atoi(argv[2]); ei=atoi(argv[3]);
	if (si < 1 || ei < 1 || ei < si) { puts("Invalid index"); return 1; }
	fp=fopen(argv[1], "rb");
	argv[1][0]='#'; if (fp) ofp=fopen(argv[1], "wb");
	if (fp && ofp) {
		long line=1, read;
		unsigned char ch;
		
		do {
			read=fread(&ch, 1, 1, fp);
			if (read && line >= si && line <= ei) fwrite(&ch, 1, 1, ofp);
			if (ch == 0xa) line++;
		} while(read);
		
		if (fp) fclose(fp);
		if (ofp) fclose(ofp);
	} else puts("File error");
	return 0;
}
I attached a binary compiled with tcc. It reads and writes byte by byte, so it's no speed demon :mrgreen:
Attachments
cpbinl.zip
(1.48 KiB) Downloaded 425 times

Bucko
Posts: 14
Joined: 05 Jun 2018 01:01

Re: Unable to understand how to use findstr...

#7 Post by Bucko » 10 Jun 2018 00:41

That's just fantastic, misol101!

It does not solve my problem with using findstr (and I still hope someone will show me a solution), but:

Your answer allows me to think of a very different concept for my general project. I did not know that C and TCC make it possible to write such extremely small programs.

I am extremely excited and thank you very much for your answer.

May I send you a PM?

misol101
Posts: 475
Joined: 02 May 2016 18:20

Re: Unable to understand how to use findstr...

#8 Post by misol101 » 10 Jun 2018 05:23

Sure thing, though I’m not sure how much time I will have to look into it

sst
Posts: 93
Joined: 12 Apr 2018 23:45

Re: Unable to understand how to use findstr...

#9 Post by sst » 10 Jun 2018 21:29

Bucko wrote:
09 Jun 2018 12:22
... One question: At the end of the last line a newline is added (in output.txt, a blank 6th line). Do you have an idea how this could be avoided?
If the lines does not have leading white spaces or equal sign(=) `set /p "=!var!"<nul>outFile` can be used for that matter. On Vista and beyond set /p removes leading white spaces from prompt string.
An Alternate less efficient but working method is to use prompt trick:

Code: Select all

setlocal
.
.
set "prompt=!var:$=$$!"
cmd /d /k <nul>outFile
.
.
endlocal
So using prompt trick as example, that part of code can written as follows

Code: Select all

for /L %%A in (0,1,%return_count%) do (
		set /p print_line=
		set "prompt=!print_line:$=$$!"
		cmd /d /k <nul
		if %%A LSS %return_count% echo,
	)
set /p is more efficient and cleaner approach but it can not be used with arbitrary data. Either way that is too much of overhead for just eliminating the last newline from output.

sst
Posts: 93
Joined: 12 Apr 2018 23:45

Re: Unable to understand how to use findstr...

#10 Post by sst » 11 Jun 2018 01:12

Bucko wrote:
09 Jun 2018 08:00
Please note that my question is limited to findstr, because such a solution could be used for binary files too (which are my real target).
That is not necessarily true. As far as I'm concerned, findstr on it's own, does not have the capability to filter specific line numbers. So the assumption that any solution of your task involving findstr, can automatically handle binary files is not true.

However there is a method which can handle binary files based on the criteria you have specified. There is also a method which can only be used to handle text files (Or more precisely files without <NULL> bytes) but in a much more efficient way.

findstr solution to handle text files:

Code: Select all

@echo off
setlocal EnableExtensions DisableDelayedExpansion

:: Parameters
set "Input=input.txt"
set "Output=output.txt"
:: The minimum value for startLine is 1
set /a "startLine=5, endLine=9"


set /a "startLine-=1"
if %startLine% EQU 0 (set "skip=") else set "skip=skip=%startLine%"
(for /F "%skip% tokens=1* delims=:" %%K in ('findstr /N /R "^" "%Input%"') do (
    if %%K LEQ %endLine% (echo(%%L)
))>"%Output%"
The above will work well if the difference between endLine and total number lines in input files is relatively small. for example when endLine=130 and totalLines=134
But as difference gets bigger (endLine=130, totalLines=800) the performance will degrade because batch script have no means of immediate break from FOR loops once the job is done there is no need remain in the loop anymore. The only command that can immediately break FOR loops is `exit` but that will also terminates the host cmd instance. The solution is to execute the FOR loop in a child instance of cmd along with exit command.
So the FOR loop block can replaced with this:

Code: Select all

>"%Output%" cmd /e:on /v:off /d /c for /F "%skip% tokens=1* delims=:" %%K in ^('findstr /N /R "^" "%Input%"'^) do @if %%K LEQ %endLine% ^(echo(%%L^) else exit
This can also be performance killer if the difference between endLine/totalLine is too small and you are processing too many files in this condition. So I think ShadowThief's solution may better suite your needs in this regards. You should test and compare.

findstr solution to handle binary files:

Code: Select all

@echo off
setlocal EnableExtensions DisableDelayedExpansion

:: Parameters
set "Input=input.bin"
set "Output=output.bin"
set /a "startLine=5, endLine=9"


set /a "cureLine=startLine-1, Lines=endLine-startLine+1, MaxDigits=9"
(for /L %%. in (1,1,%Lines%) do (
    set /a "cureLine+=1, Num=cureLine, LeadChars=1"
    for /L %%. in (1,1,%MaxDigits%) do set /a "LeadChars+=!!Num, Num/=10"
    findstr /N /R "^" "%Input%"|(findstr /R "^%%cureLine%%:")|((for /L %%. in (1,1,%%LeadChars%%) do @pause)>nul & findstr /R "^")

    REM // This line takes into account the possibility for disabled command extensions. Use instead of above if that is a concern.
    REM findstr /N /R "^" "%Input%"|(findstr /R "^%%cureLine%%:")|cmd /e:on /d /c ^(for /L %%^^^. in ^(1,1,%%LeadChars%%^) do @pause^)^>nul ^& findstr /R "^"
))>"%Output%"
This is based on this assumption:
Bucko wrote:
09 Jun 2018 12:22
All my binary files (which I intend to use) have "lines" (i.e. newline characters), and these "lines" are not to long for findstr (there is a length limit). So findstr definitely would work with my binary data.
Be aware that this method is extremely slow and inefficient in terms of performance but should do the job. That was the best I could come up with.

Post Reply