I work with Fixed fielded Fixed Length text files. This basically means every single line in the text file is the same length. Each field in the line has a predefined starting and ending positions.
Name 1-30
Street 31-60
City 61-80
etc....
Most of the files I work with have a record length (line length) of hundreds bytes long and can have millions of records (lines) in the file.
Goal:
Quickly determine what the record (line) length is and how many records are in the file without using the FIND or FOR /F commands to parse the entire file. Using the /V and /C switches with the find command can take a long time and using the FOR /F command with a counter takes forever as well.
So my thought process is if I can read in the first line of a file using SET /P LINE1=<"%~1" and pass that off to the String Length function I can easily determine what the current line length is minus the CR/LF (more about that later). Now I can also get the file size by using the Variable Modifiers (%~z1) which should give me the total bytes of the file. Now that I know the line length and the file size I should be able to do some simple math to determine the total number of records in the file.
set /a records=%FileSize% / (%LineLength%+2)
I am adding 2 bytes to the line length for the CR/LF.
Issues:
1. SET /P as we all know doesn't play nice with files that are only LF terminated.
2. Is there a way to determine if the end of line is LF or CRLF? Will need to know this for the true line length. Do I add 1 or 2 bytes to the True Line Length. I know there is the trick to getting a CR into a variable but is there a way to use that as a search string and test the errorlevel to see if it is at the end of the line?
3. Been testing out the String Length function and it seems to be shorting me by one character.
Code: Select all
@echo off
setlocal enableDelayedExpansion
REM This string is 116 bytes
set "str=Mr. & Mrs. John & Nancy Thompson 123 Any Street AnyWhere IL60054-1234 Mr. & Mrs. Thompson! KEY123 "
call :strlen str len
echo %len%
exit /b
:strLen string len -- returns the length of a string
:: -- string [in] - variable name containing the string being measured for length
:: -- len [out] - variable to be used to return the string length
:: Many thanks to 'sowgtsoi', but also 'jeb' and 'amel27' dostips forum users helped making this short and efficient
:$created 20081122 :$changed 20101116 :$categories StringOperation
:$source http://www.dostips.com
( SETLOCAL ENABLEDELAYEDEXPANSION
set "str=A!%~1!"&rem keep the A up front to ensure we get the length and not the upper bound
rem it also avoids trouble in case of empty string
set "len=0"
for /L %%A in (12,-1,0) do (
set /a "len|=1<<%%A"
for %%B in (!len!) do if "!str:~%%B,1!"=="" set /a "len&=~1<<%%A"
)
)
( ENDLOCAL & REM RETURN VALUES
IF "%~2" NEQ "" SET /a %~2=%len%
)
EXIT /b
Output
Code: Select all
C:\Users\Squash\batch files\String_Length>StrLength.bat
115
Now I don't often get data with Exclamation points in them. We can pretty much go with the Pareto principle and my 80/20 rule is more like 99/1. Exclamation points and most any other special characters are few and far between. Most of the data I get is just names and addresses and a few other number codes. If I change that exclamation point to a space after the salutation it correctly counts the string length as 116.
Figuring out the LF end of line problem would also help me with another batch file that I use but that will probably be its own thread.
Any help is greatly appreciated.