Determining the number of lines in a file.
Moderator: DosItHelp
Re: Determining the number of lines in a file.
I think I can use Judago's Divide.bat
http://judago.webs.com/content/Scripts/divide.txt
http://judago.webs.com/content/Scripts/divide.txt
Re: Determining the number of lines in a file.
I was thinking about making this post a separate thread but since it is going to be integrated into this one script I am just going to post it here.
As I said in my last post I am going to use Judago's Divide.bat to calculate the number of lines because we may run into file sizes that are larger than the Integer maximum of 2147483647. Going to take the total line length (Data + EOL) from the code that Dave gave me and divide that into the file size.
With Judago's script is does seem to return the correct MOD (remainder) when you do specify the number of decimals places to be zero.
When you don't specify the number decimals you get some weird output.
Now his code allows for it to return a variable and that seems to work fine.
But I also want to return the MOD(remainder) or what he calls the LeftOver. But when I change the code at the bottom for the output to this I do not get the what is stored in the array variable.
Bolded what I added.
So when I run it like this.
Why does the LeftOver ECHO fine from the batch file but doesn't set variable the way I need it to in the batch file? The remainder should be 1. Not sure what I am doing wrong. Probably a rookie mistake.
As I said in my last post I am going to use Judago's Divide.bat to calculate the number of lines because we may run into file sizes that are larger than the Integer maximum of 2147483647. Going to take the total line length (Data + EOL) from the code that Dave gave me and divide that into the file size.
With Judago's script is does seem to return the correct MOD (remainder) when you do specify the number of decimals places to be zero.
Code: Select all
E:\batch files\HEAD>divide.bat 24627955 / 985 "" 0
25003 - Leftover: (this isn't always the mod result)
E:\batch files\HEAD>divide.bat 24627956 / 985 "" 0
25003 - Leftover:1 (this isn't always the mod result)
E:\batch files\HEAD>divide.bat 24627954 / 985 "" 0
25002 - Leftover:984 (this isn't always the mod result)
When you don't specify the number decimals you get some weird output.
Code: Select all
E:\batch files\HEAD>divide.bat 24627956 / 985
25003.00101522 - Leftover:830 (this isn't always the mod result)
Now his code allows for it to return a variable and that seems to work fine.
Code: Select all
E:\batch files\HEAD>divide.bat 24627956 / 985 result 0
E:\batch files\HEAD>echo %result%
25003
But I also want to return the MOD(remainder) or what he calls the LeftOver. But when I change the code at the bottom for the output to this I do not get the what is stored in the array variable.
Bolded what I added.
Code: Select all
if not "%~3"=="" (
endlocal
set %~3=%total%
[b]set %~5=!input%chunk%![/b]
) else (
echo %total% - Leftover:!input%chunk%! ^(this isn't always the mod result^)
endlocal
)
So when I run it like this.
Code: Select all
E:\batch files\HEAD>divide.bat 24627956 / 985 result 0 remainder
E:\batch files\HEAD>echo %result% %remainder%
25003 !input0!
Why does the LeftOver ECHO fine from the batch file but doesn't set variable the way I need it to in the batch file? The remainder should be 1. Not sure what I am doing wrong. Probably a rookie mistake.
Re: Determining the number of lines in a file.
'
Without reading this lengthy thread nor understanding everything, try: SETLOCAL ENABLEDELAYEDEXPANSION
http://judago.webs.com/content/Scripts/divide.txt
Without reading this lengthy thread nor understanding everything, try: SETLOCAL ENABLEDELAYEDEXPANSION
http://judago.webs.com/content/Scripts/divide.txt
Code: Select all
:finish
if "%total:~0,1%"=="0" if not "%total:~1%"=="" set total=%total:~1%&&goto finish
if "%total:~0,1%"=="." set total=0%total%
set "Squashman=!input%chunk%!"
if not "%~3"=="" (
endlocal
set %~3=%total%
set "%~5=%Squashman%"
) else (
echo %total% - Leftover:!input%chunk%! ^(this isn't always the mod result^)
endlocal
)
exit /b 0
Re: Determining the number of lines in a file.
Thanks Ed. That worked. Didn't realize I needed to assign that variable to another variable before the ENDLOCAL. Delayed Expansion was enabled higher up in the script.
Re: Determining the number of lines in a file.
I wrote a very small program that read bytes from stdin looking for the first LF character and display the lenght of the first line; it is called LINE1LEN.COM. It return an ERRORLEVEL of 2 if the line end with CR+LF, or 1 if it ends with LF only.
I also wrote a second program that count the number of LF characters it reads from stdin and display it; it is called NUMLINES.COM. It return an ERRORLEVEL comprised of one or two digits that represent the last two characters of the file this way: 1=LF, 2=EOF, 0=any other. For example: 12=LF+EOF, 2=EOF only, etc; the number of lines is incremented by 1 if the last characters does not include a LF.
These programs have not limit in the size of the file they can read. The characters per line counter is a 16-bits wide value (up to 65535) and the number of lines counter is a 32-bits wide value. They are .COM executable files written in assembly language, but they are not the faster ones because they read just one character at once to keep them simple and small. However, I am confident that they will run faster than equivalent Batch methods.
Although I wrote these programs specifically for the requirements of this topic, they may also be of general use. For example, you no longer need LINE1LEN program because NUMLINES directly gives the real number of lines in the file, no matters the lenght of each one.
I also wrote a second program that count the number of LF characters it reads from stdin and display it; it is called NUMLINES.COM. It return an ERRORLEVEL comprised of one or two digits that represent the last two characters of the file this way: 1=LF, 2=EOF, 0=any other. For example: 12=LF+EOF, 2=EOF only, etc; the number of lines is incremented by 1 if the last characters does not include a LF.
These programs have not limit in the size of the file they can read. The characters per line counter is a 16-bits wide value (up to 65535) and the number of lines counter is a 32-bits wide value. They are .COM executable files written in assembly language, but they are not the faster ones because they read just one character at once to keep them simple and small. However, I am confident that they will run faster than equivalent Batch methods.
Although I wrote these programs specifically for the requirements of this topic, they may also be of general use. For example, you no longer need LINE1LEN program because NUMLINES directly gives the real number of lines in the file, no matters the lenght of each one.
Code: Select all
@echo off
if not exist line1len.com call :CreateLine1Len
if not exist numlines.com call :CreateNumLines
line1len < %1 > line1len.tmp
set line1eol=%errorlevel%
set /P line1len=< line1len.tmp
numlines < %1 > numlines.tmp
set fileEof=%errorlevel%
set /P numlines=< numlines.tmp
echo Length of first line: %line1len%
set /P =First line ends with: < NUL
if %line1eol% == 1 (echo LF only) else echo CR+LF
echo Number of lines in the file: %numlines%
set /P =The file ends with: < NUL
if %fileEof% == 0 echo Normal character (no LF or EOF)
if %fileEof% == 1 echo LF character only (no EOF)
if %fileEof% == 2 echo EOF character only (no previous LF)
if %fileEof% == 10 echo LF followed by a normal character (no EOF)
if %fileEof% == 12 echo LF and EOF characters
goto :EOF
:CreateLine1Len
setlocal DisableDelayedExpansion
set line1len=3ɲ+€ê!´)€ì!Í!:ÂtLAŠð´,€ì!Í!"Àuç2ÀþÀR².€ê!R:òu1þÀI‹ø‹Á3ÉAA2ÿ³+€ë!3Ò÷ó€Â0RA#ÀuóZ´#€ì!Í!âö‹Ç´LÍ!ëÀëÐ
setlocal EnableDelayedExpansion
echo !line1len!>line1len.com
exit /B
:CreateNumLines
setlocal DisableDelayedExpansion
set numlines=f3ɲ+€ê!´)€ì!Í!:Âu^|fAŠûŠØ´,€ì!Í!"ÀuäR:úuh²;€ê!:Úua°-,!ë °+,!fAëõ:ÚuPþÀëífA²;€ê!:ÚuâþÀþÀ².€ê!R‹øf‹Á3ÉAAf3Û³+€ë!f3Òf÷ó€Â0RAf#ÀuðZ´#€ì!Í!âö‹Ç´LÍ!ë„ë®ë¤ë²
setlocal EnableDelayedExpansion
echo !numlines!>numlines.com
exit /B
Re: Determining the number of lines in a file.
Well here is some interesting output from this batch.
It looks like it is putting the file size back into the FC command. Not sure why I am getting the batch label error.
I basically combined Dave's script to get the line length and combined it with Judago's Divide.bat to do math on large numbers into 1 batch file. Taking the Line Length from Dave' script and using Judago's Divide script to divide the File Size by the Line length which in theory should give the number of lines in the file based on the fact that all the files I use are fixed length files. I then output this to a log file.
Code: Select all
E:\batch files\HEAD>FileAttrib.bat EST3_CRLF.txt
FC: cannot open 25712336 - No such file or folder
The system cannot find the batch label specified - error
E:\batch files\HEAD>type FileAttrib.log
Filename Quantity RecLength EOL
25712336 1026 0D0A
EST3_CRLF.txt 1026 0D0A
E:\batch files\HEAD>dir EST3_CRLF.txt | find "EST3_CRLF.txt"
11/29/11 01:20 PM 25,712,336 EST3_CRLF.txt
It looks like it is putting the file size back into the FC command. Not sure why I am getting the batch label error.
I basically combined Dave's script to get the line length and combined it with Judago's Divide.bat to do math on large numbers into 1 batch file. Taking the Line Length from Dave' script and using Judago's Divide script to divide the File Size by the Line length which in theory should give the number of lines in the file based on the fact that all the files I use are fixed length files. I then output this to a log file.
Code: Select all
@echo off
setlocal enableDelayedExpansion
:: Filename for 40 Quantity for 11 RecLength for 11 EOL for 4
echo.Filename Quantity RecLength EOL >FileAttrib.log
:loop
set "fSize=%~z1"
::Build a dummy file with length 32kbytes to do a binary compare with
::This file could be made larger or smaller, depending on requirements
<nul set /p ".=A" >dummy.txt
for /l %%n in (1 1 15) do type dummy.txt >>dummy.txt
::Use FC /B to compare with dummy. Use FINDSTR to locate offset and hex representation of each CR or LF
::Use FOR /F to only look at the 1st two instances.
for /f "tokens=1,2 delims=: " %%A in ('fc /b "%~1" dummy.txt ^| findstr /r /c:": 0D 41$" /c:": 0A 41$"') do (
if not defined eolOffset (
set /a "eolOffset=0x%%A, next=eolOffset+1, eolSize=1, next=eolOffset+1"
set "eol=%%B"
) else (
set /a "eolOffset2=0x%%A
if "!eolOffset2!"=="!next!" if "!eol!" neq "%%B" (
set "eol=!eol!%%B"
set /a "eolSize+=1"
)
goto :break
)
)
:break
set /a "recordLen=eolOffset, lnLen=recordLen+eolSize"
CALL :division %fSize% / %lnLen% records 0 remainder
SET "OFile=%~1 ."
SET "records=%records% ."
SET "recordLen=%recordLen% ."
SET "eol=%eol% ."
echo.%Ofile:~0,40%%records:~0,11%%recordLen:~0,11%%eol:~0,4%>>FileAttrib.log
:: echo Record 1 length = %recordLen%
:: echo eol = %eol%
:: echo eolSize = %eolSize%
:: echo Line 1 length = %lnLen%
endlocal
exit /b
:division
if "%~1"=="/?" (
echo.&echo USAGE:&echo.
echo "%~0" largenumber smallnumber [variablename] [places]
echo "%~0" largenumber / smallnumber [variablename] [places]
echo.&echo "largenumber" can floating point "smallnumber" can't.
echo Only the first [places] decimal places are used for input or
echo output. if [places] is omitted then the default of 8 is used.
echo.&echo To specify [places] with out [variablename] pass an empty set.
echo Below is an example:
echo.&echo "%~0" 46546545464.123456789 / 1024 "" 9&echo.
echo.&echo "smallnumber" must be below 2097153, "largenumber" can have
echo hundreds of places.&echo.&echo -Judago 2009/2010
exit /b 0
)
SETLOCAL ENABLEDELAYEDEXPANSION
set error=Invalid Input
if "%~1"=="" goto error
if "%~2"=="/" shift /2
if "%~2"=="" goto error
for /f "delims=1234567890." %%a in ("%~1%~2") do goto error
set divisor=%~2
set error=Divisor must be whole
if not "!divisor!"=="!divisor:.=!" goto error
if !divisor! gtr 2097152 (
set error=Divisor too large, limited to: 2097152
goto error
)
set dplace=%~4
if not defined dplace set dplace=8
for /f "delims=1234567890." %%a in ("%~4") do set dplace=8
set input=0%~1
for /l %%a in (1 1 %dplace%) do set input=!input!0
set error=Divide by zero
if "!divisor:0=!"=="" goto error
set chunk=
set total=
set /a fpos=dplace + 1
:isfloat
if not "!input:.=!"=="!input!" (
if not "!input:~-%fpos%,1!"=="." (
set input=!input:~0,-1!
goto isfloat
) else (
set input=!input:.=!
)
)
:split
if not "%input:~3%"=="" (
set /a chunk+=1
set input!chunk!=%input:~-3%
set input=%input:~0,-3%
if defined input goto split
) else (
set /a chunk+=1
set input!chunk!=%input%
)
:loop
if defined input%chunk% (
if "!input%chunk%:~0,1!"=="0" (
set input%chunk%=!input%chunk%:~1!
goto loop
)
) else (
set input%chunk%=0
goto pad
)
set chunkresult=0
:divide
If !input%chunk%! geq !divisor! (
If !input%chunk%! geq !divisor!000 (
set /a input%chunk%-=!divisor!000
set /a chunkresult+=1000
goto divide
) else (
If !input%chunk%! geq !divisor!00 (
set /a input%chunk%-=!divisor!00
set /a chunkresult+=100
goto divide
) else (
If !input%chunk%! geq !divisor!0 (
set /a input%chunk%-=!divisor!0
set /a chunkresult+=10
goto divide
) else (
set /a input%chunk%-=!divisor!
set /a chunkresult+=1
goto divide
)
)
)
)
:pad
if "!chunkresult:~2,1!"=="" set chunkresult=0!chunkresult!
if "!chunkresult:~2,1!"=="" set chunkresult=0!chunkresult!
set total=%total%%chunkresult%
set chunkresult=0
if %chunk% gtr 0 (
set /a chunk-=1
if !input%chunk%! gtr 0 (
set carry=!input%chunk%!
for %%a in (!chunk!) do set input!chunk!=!carry!!input%%a!
)
)
if %chunk% gtr 0 goto loop
if not defined total set total=0
if %dplace% gtr 0 set total=!total:~0^,-%dplace%!.!total:~-%dplace%!
:finish
if "%total:~0,1%"=="0" if not "%total:~1%"=="" set total=%total:~1%&&goto finish
if "%total:~0,1%"=="." set total=0%total%
set "mod=!input%chunk%!"
IF NOT DEFINED mod SET mod=0
if not "%~3"=="" (
endlocal
set %~3=%total%
set "%~5=%mod%"
) else (
echo %total% - Leftover:!input%chunk%! ^(this isn't always the mod result^)
endlocal
)
exit /b 0
:Error
1>&2 echo %error% - See "%~0 /?"
endlocal
exit /b 1
Re: Determining the number of lines in a file.
Just bumping this to the top. I abandoned this project last year because I got frustrated with it not working. Going to start working on this again.
Re: Determining the number of lines in a file.
I threw in two echos after the :BREAK label and two after the CALL just to see where I am at in the code and to see what the %1 variable is.
The output again baffles me. It looks like it is trying to run the For Loop with the FC command twice and it also looks like it is running the CALL to the division subroutine twice. It should only do each once.
Code: Select all
:break
echo OUT OF THE FC FOR LOOP
echo %1
set /a "recordLen=eolOffset, lnLen=recordLen+eolSize"
SET "OFile=%~1 ."
CALL :division %fSize% / %lnLen% records 0 remainder
echo Back from Division sub routine
echo %1
The output again baffles me. It looks like it is trying to run the For Loop with the FC command twice and it also looks like it is running the CALL to the division subroutine twice. It should only do each once.
Code: Select all
C:\batch files\Fixed_Attrib>Fixed_Attrib.bat EST2.txt
OUT OF THE FC FOR LOOP
EST2.txt
FC: cannot open 25690405 - No such file or folder
OUT OF THE FC FOR LOOP
25690405
Invalid Input - See ":division /?"
Back from Division sub routine
25690405
Back from Division sub routine
EST2.txt
C:\batch files\Fixed_Attrib>
Re: Determining the number of lines in a file.
Squashman wrote:The output again baffles me. It looks like it is trying to run the For Loop with the FC command twice and it also looks like it is running the CALL to the division subroutine twice. It should only do each once.
Short answer is that you have two :loop labels.
That said, I haven't followed all of it, and you could probably make it easier on others if you described the context in more detail e.g. what the various files are (est2.txt, FileAttrib.log, dummy.txt).
Liviu
Re: Determining the number of lines in a file.
Well you would have to read the entire thread to understand what everything is doing.
I never saw the two :LOOP labels. I have no idea how that got in there. I can't believe I never saw that.
I feel like a complete IDIOT!
I should have turned the ECHO ON to see all the commands executing!!!! Lesson Learned again!
Script is working as it should now.
If you haven't read through this thread in the past, basically what my script does is give the Fixed File Attributes. I work with Fixed length text files. Which basically means every line within the text file is the same length. Basically I wanted a quick way to find out what the Line Length was and how many lines were in the file. With Dave's code he also gave me the ability to determine what the End of Line characters are. Does it have a CR\LF or Just a LF.
Big Thanks to Dave for the Line Length code.
Thanks to Ed for helping me tweak Judago's script.
A Huge thanks to Liviu for finding that stupid coding mistake.
So basically my output looks like this.
I am going to tweak some more so that the script can take multiple input files. All my scripts at work do that so I will just make sure I don't do another stupid duplicate LABEL!
I never saw the two :LOOP labels. I have no idea how that got in there. I can't believe I never saw that.
I feel like a complete IDIOT!
I should have turned the ECHO ON to see all the commands executing!!!! Lesson Learned again!
Script is working as it should now.
If you haven't read through this thread in the past, basically what my script does is give the Fixed File Attributes. I work with Fixed length text files. Which basically means every line within the text file is the same length. Basically I wanted a quick way to find out what the Line Length was and how many lines were in the file. With Dave's code he also gave me the ability to determine what the End of Line characters are. Does it have a CR\LF or Just a LF.
Big Thanks to Dave for the Line Length code.
Thanks to Ed for helping me tweak Judago's script.
A Huge thanks to Liviu for finding that stupid coding mistake.
So basically my output looks like this.
Code: Select all
Filename Quantity RecLength EOL
EST2.txt 25015 1026 0A
I am going to tweak some more so that the script can take multiple input files. All my scripts at work do that so I will just make sure I don't do another stupid duplicate LABEL!
Re: Determining the number of lines in a file.
I came across this code to help identify number of lines in a text file, an I have used it in my scripting too.
Perhaps this may help?
:: code start to identify number of lines in a text
Set FileName=NameOfFile
Set /a LineNumb=0
for /f "tokens=2 delims=:" %%a in ('find /c /v "" %FileName%') do set /a LineNumb=%%a
@Echo %FileName% has %LineNumb% lines.
Perhaps this may help?
:: code start to identify number of lines in a text
Set FileName=NameOfFile
Set /a LineNumb=0
for /f "tokens=2 delims=:" %%a in ('find /c /v "" %FileName%') do set /a LineNumb=%%a
@Echo %FileName% has %LineNumb% lines.
Re: Determining the number of lines in a file.
booga73 wrote:I came across this code to help identify number of lines in a text file, an I have used it in my scripting too.
Perhaps this may help?
:: code start to identify number of lines in a text
Set FileName=NameOfFile
Set /a LineNumb=0
for /f "tokens=2 delims=:" %%a in ('find /c /v "" %FileName%') do set /a LineNumb=%%a
@Echo %FileName% has %LineNumb% lines.
Did you read my first post in this thread?
Re: Determining the number of lines in a file.
Perhaps you may want to test the Batch file below; it use FINDSTR with /O option to get the line length, and divide the file size by line length to correctly get the remainder.
FINDSTR /O get the length of the first (any) line in bytes independently if it ends in CR+LF or just LF.
Antonio
EDIT: I fixed a small error: the longest line was divided at this part "remainder=group%%lineLen" because my text editor.
FINDSTR /O get the length of the first (any) line in bytes independently if it ends in CR+LF or just LF.
Code: Select all
@echo off
setlocal EnableDelayedExpansion
set fSize=%~Z1
echo File size: %fSize%
rem Get the length of the first line
for /F "skip=1 delims=:" %%a in ('findstr /O "^" "%~1"') do set lineLen=%%a& goto break
:break
echo Line length: %lineLen%
rem Split the file size in groups of 4 digits
set N=0
:nextGroup
set group=%fSize:~-4%
:checkLeftZero
if "%group:~0,1%" neq "0" goto noLeftZero
set group=%group:~1%
if defined group goto checkLeftZero
:noLeftZero
if not defined group set group=0
set /A N+=1
set group[%N%]=%group%
set fSize=%fSize:~0,-4%
if defined fSize goto nextGroup
rem Divide the groups by the line length and assemble the result
set quotient=
set remainder=0
for /L %%i in (%N%,-1,1) do (
set /A group=remainder*10000+group[%%i], group[%%i]=group/lineLen, remainder=group%%lineLen
if not defined quotient (
if !group[%%i]! neq 0 set quotient=!group[%%i]!
) else (
set group=000!group[%%i]!
set quotient=!quotient!!group:~-4!
)
)
echo Number of records: %quotient%
echo Remainder: %remainder%
Antonio
EDIT: I fixed a small error: the longest line was divided at this part "remainder=group%%lineLen" because my text editor.
Last edited by Aacini on 09 Feb 2013 08:33, edited 1 time in total.
Re: Determining the number of lines in a file.
Aacini wrote:Perhaps you may want to test the Batch file below; it use FINDSTR with /O option to get the line length, and divide the file size by line length to correctly get the remainder.
FINDSTR /O get the length of the first (any) line in bytes independently if it ends in CR+LF or just LF.
The whole point of this thread is to quickly get the number of lines in a file, assuming all lines are the same length. As Squashman clearly stated in the opening post to this thread, any technique that uses FOR /F or FINDSTR or FIND to scan the entire file is unacceptable for performance reasons. Try running your script against a multi-gigabyte file.
Dave Benham
Re: Determining the number of lines in a file.
dbenham wrote:The whole point of this thread is to quickly get the number of lines in a file, assuming all lines are the same length. As Squashman clearly stated in the opening post to this thread, any technique that uses FOR /F or FINDSTR or FIND to scan the entire file is unacceptable for performance reasons. Try running your script against a multi-gigabyte file.
Dave Benham
Oops! You are right! I thought for a moment that the GOTO BREAK will break the FOR after read the second line (I was so tired last night... )
This problem can be solved executing FINDSTR /O outside a FOR, passing its output through a pipe to a code that read just the second line and terminate, so the SO will cancel FINDSTR after a "write to a non-existent pipe" error. Here it is:
Code: Select all
rem Get the length of the first line
findstr /O "^" "%~1" 2>NUL | findstr /V "^0:" 2>NUL | (set /P lineLen=& set lineLen ) > lineLen.txt
for /F "tokens=2 delims==:" %%a in (lineLen.txt) do set lineLen=%%a
echo Line length: %lineLen%
This way, first FINDSTR start output line sizes, second FINDSTR take they and output the second line, and the next code take it, save it in a file and terminate. At this moment second FINDSTR is aborted, so first FINDSTR is also aborted.
I tried at first to directly read the second line this way:
Code: Select all
findstr /O "^" "%~1" 2>NUL | (set /P =&set /P lineLen=& set lineLen ) > lineLen.txt
Antonio