- Supports Windows and Unix line endings.
- Handles long lines.
- Copes with notorious show-stoppers like Control Z and the Null Character.
- Works with Unicode as well as ANSI text files.
- Can count lines in files with more than Batch integer max lines.
robust line counter
Moderator: DosItHelp
-
- Posts: 231
- Joined: 01 Oct 2012 13:32
- Location: Ireland
- Contact:
robust line counter
Below is a little program to count the number of lines in a text file. It has the following features:
Last edited by Sponge Belly on 28 Feb 2018 13:59, edited 4 times in total.
-
- Posts: 104
- Joined: 28 Jul 2011 17:32
Re: robust line counter
I use this in my batches:
Code: Select all
findstr /R /N "^" %file% | find /C ":">lines
< lines set /p "num="
del lines
Re: robust line counter
I just use good old find /C. works for 99.9% of the data i work with.
Re: robust line counter
Sponge Belly wrote:Hello All!
Below is a little program to count the number of lines in a text file. It has the following features:
- Supports Windows and Unix line endings (but not MacOS 9 or earlier).
- Handles extremely long lines.
- Copes with notorious show-stoppers like Control Z and the Null Character.
Can you test this on your suite of text files?
Code: Select all
find /c /v "" <file
-
- Posts: 231
- Joined: 01 Oct 2012 13:32
- Location: Ireland
- Contact:
Re: robust line counter
Hi Again!
Thanks for your replies.
@Foxi: I did too run tests. One thing I discovered is that find turns null characters into newlines which causes an incorrect count.
@Squashman: You’re right, of course. Good old find /c /v "" < file is just fine… 99% of the time. My code is an attempt to deal with those less than 1% of edge cases where you may find yourself working with text that you can’t make any assumptions about.
@Ranguna173: D’oh! That’s so clever and so simple. Wish I’d thought of it! And it isn’t tripped up by null characters, for some reason. Once again I am reminded of how much I have yet to learn…
- SB
Thanks for your replies.
@Foxi: I did too run tests. One thing I discovered is that find turns null characters into newlines which causes an incorrect count.
@Squashman: You’re right, of course. Good old find /c /v "" < file is just fine… 99% of the time. My code is an attempt to deal with those less than 1% of edge cases where you may find yourself working with text that you can’t make any assumptions about.
@Ranguna173: D’oh! That’s so clever and so simple. Wish I’d thought of it! And it isn’t tripped up by null characters, for some reason. Once again I am reminded of how much I have yet to learn…
- SB
Re: robust line counter
Sponge Belly wrote:@Foxi: I did too run tests. One thing I discovered is that find turns null characters into newlines which causes an incorrect count.
Thanks, confirmed in Win 8 too.
But so does this when there's a null at the beginning of the line anyway.
Code: Select all
findstr /r /n "^" a.bat |find /c ":"
and your code also fails here - this is the test file which should report 3 lines when checked. The three techniques here report 4 lines. http://www.astronomy.comoj.com/testbat.zip
-
- Posts: 231
- Joined: 01 Oct 2012 13:32
- Location: Ireland
- Contact:
Re: robust line counter
Hi Foxi!
Had a look at your test file. It has four lines. The fourth line doesn’t end with a CR+LF, but it still counts. Get a hex dump from certutil if you don’t believe me.
I tried both mine and Ranguna173's code on a test file with null characters at the beginning, middle and end of lines. They weren’t fooled. Windows 7 Home Premium, fwiw.
Btw, Ranguna173's golden nugget can be rewritten as:
Hope this helps!
- SB
Had a look at your test file. It has four lines. The fourth line doesn’t end with a CR+LF, but it still counts. Get a hex dump from certutil if you don’t believe me.
I tried both mine and Ranguna173's code on a test file with null characters at the beginning, middle and end of lines. They weren’t fooled. Windows 7 Home Premium, fwiw.
Btw, Ranguna173's golden nugget can be rewritten as:
Code: Select all
for /f %%l in ('
findstr /n "^" "%~1" ^| find /c ":"
') do set lines=%%l
echo(file "%~1" has %lines% lines
Hope this helps!
- SB
Re: robust line counter
what does it matter if the last line doesnt have a crlf. if the previous line does it still should be counted as a line.
Re: robust line counter
Sponge Belly wrote:Hi Foxi!
Had a look at your test file. It has four lines. The fourth line doesn’t end with a CR+LF, but it still counts. Get a hex dump from certutil if you don’t believe me.
Yes, you are right. I was counting CRLF and didn't see the obvious.
Btw, Ranguna173's golden nugget can be rewritten as:Code: Select all
for /f %%l in ('
findstr /n "^" "%~1" ^| find /c ":"
') do set lines=%%l
echo(file "%~1" has %lines% lines
That technique fails with a.txt in this file: http://www.astronomy.comoj.com/testfile.zip
where it reports 4 instead of 3 lines because of the NULL issue in FIND.EXE
This works however:
Code: Select all
@echo off
for /f "delims=:" %%a in ('findstr /n "^" "%~1"') do set lines=%%a
echo %lines% lines
pause
-
- Posts: 231
- Joined: 01 Oct 2012 13:32
- Location: Ireland
- Contact:
Re: robust line counter
Dammit, Foxi. You’re right.
If a colon comes after a null character on a line of output from findstr /n "^" "%~1", find /c ":" will count it as two lines. That torpedos Ranguna173’s otherwise elegant solution.
As to your second point, try this (note the null character on line 4):
But if you wrap it up in a for /f loop, you’re in for a surprise:
Near as I can tell, if a null character is output inside the in (...) clause of a for /f loop, the null character and anything following it is discarded up until the end of line. The newline is suppressed and the next line is appended instead. This all happens in one iteration of the loop, not two. In fact, the line to be output will keep growing so long as a null character is found on each successive line.
But the good news is that the code in the OP still holds up.
- SB
If a colon comes after a null character on a line of output from findstr /n "^" "%~1", find /c ":" will count it as two lines. That torpedos Ranguna173’s otherwise elegant solution.
As to your second point, try this (note the null character on line 4):
Code: Select all
findstr /n "^" a2z.txt
Output:
1:the quick
2:brown fox
3:jumps over
4:the<NUL>lazy
5:dog
But if you wrap it up in a for /f loop, you’re in for a surprise:
Code: Select all
for /f delims^= %%l in ('
findstr /n "^" a2z.txt
') do echo(%%l
Output:
1:the quick
2:brown fox
3:jumps over
4:the5:dog
Near as I can tell, if a null character is output inside the in (...) clause of a for /f loop, the null character and anything following it is discarded up until the end of line. The newline is suppressed and the next line is appended instead. This all happens in one iteration of the loop, not two. In fact, the line to be output will keep growing so long as a null character is found on each successive line.
But the good news is that the code in the OP still holds up.
- SB
Re: robust line counter
Sponge Belly wrote:try this (note the null character on line 4):Code: Select all
findstr /n "^" a2z.txt
Output:
1:the quick
2:brown fox
3:jumps over
4:the<NUL>lazy
5:dog
But if you wrap it up in a for /f loop, you’re in for a surprise:Code: Select all
for /f delims^= %%l in ('
findstr /n "^" a2z.txt
') do echo(%%l
Output:
1:the quick
2:brown fox
3:jumps over
4:the5:dog
Near as I can tell, if a null character is output inside the in (...) clause of a for /f loop, the null character and anything following it is discarded up until the end of line. The newline is suppressed and the next line is appended instead. This all happens in one iteration of the loop, not two. In fact, the line to be output will keep growing so long as a null character is found on each successive line.
But the good news is that the code in the OP still holds up.
That's interesting. Another gotcha to recall when needed.
-
- Posts: 231
- Joined: 01 Oct 2012 13:32
- Location: Ireland
- Contact:
Re: robust line counter
Jump to this post for the latest version of the subroutine.
Last edited by Sponge Belly on 28 Feb 2018 14:00, edited 2 times in total.
-
- Posts: 208
- Joined: 26 Dec 2013 09:28
- Contact:
Re: robust line counter
Try that on a file with more than 65K lines.
Re: robust line counter
I see that the method used by SB is supposed to handle NULs etc.
This is effective for plain text.
This is effective for plain text.
Code: Select all
find /c /v "" < FILENAME