Processing text files with very large lines via FOR /F command

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Processing text files with very large lines via FOR /F command

#1 Post by Aacini » 27 Mar 2017 00:07

Operation rules of FOR /F command about "tokens=..." option:

  • The maximum number of tokens in a FOR /F command is 32, including the "rest of tokens" last one: "tokens=1-31*".
  • The maximum length of the tokens is limited by the maximum length of the command-line that use each token, that is 8191 bytes. Shorter commands allows to use larger tokens.
  • The maximum number of tokens in the lines of a text file is equal to 4126, when all tokens have just one character and the command that process the "rest of tokens" last token is not too large. If the tokens after the 31th one are larger, then its maximum number decrease accordingly, so the length of the "rest of tokens" last token must always fit in its 8191 bytes command-line.
  • The maximum length of the lines of a text file may be near to 261000 bytes, when the 32 possible tokens have they all a length near to 8191 bytes.

The method used to determine previous rules was explained with detail at this post; you should read it before continue with this topic.

The FOR /F operation rules indicate that it is possible to process a text file that contain very large lines, up to approximately 261,000 characters. However, in order to make good use of this ability it is necessary to make certain adjustments to the data file, because it is unlikely that the data have the required FOR /F format in its standard way. In this topic there are several examples of the required management that allows to store such an amount of data in each line of a text file. The same approach may be used in any other aplication as long as the required modifications can be applied to the data file.

The program example consists of a file, called "books.txt", that store a "book" of up to 260,800 bytes in each line. This is the procedure that achieve such a management:

  • The original lines of each "book" are grouped in "fields". Each line is separated from the next one with the ASCII character 254 (þ). Any number of lines may be grouped in one field up to the limit of 8150 bytes per field.
  • The fields of the book are stored in one line of books.txt file, separated with the ASCII character 255 (ÿ). May be up to 32 fields in each line of the text file; this means that the maximum number of bytes per book is 260800, including the separators between lines and fields.
  • Each book (physical line) in books.txt file is terminated with a <CR><LF> characters pair, as usual. The maximum number of books in the file is limited by the maximum file size specified by the OS (2 GB).

In schematic form:

Code: Select all

Field 1:        Line one.þLine two.þLine three.þEt cetera.      Up to 8150 bytes

Book 1:         Field 1ÿField 2ÿField 3ÿEt cetera               Up to 32 fields

books.txt       Book 1<CR><LF>Book 2<CR><LF>Etc<CR><LF>         Up to 2GB size

The program performs all the conversions required to manage this file format. For example, to consult a "book" the program read one line from books.txt file, separate all book lines into an individual file and open it with Notepad, so the user may review and edit it. If the book was modified, the program separate the lines from the individual file and store they in a line of books.txt; the original book is updated, so its original position is preserved. A new book may also be inserted.

Code: Select all

@echo off
setlocal EnableDelayedExpansion


:nextBook

rem Show available books and lets the user to select one
cls
echo/
echo Available books:
echo/
set i=1
for /F "delims=þ" %%a in (books.txt) do (
   echo !i!- %%a
   set "book[!i!]=%%a"
   set /A i+=1
)
echo/
set /P "book=Enter book number (%i% to add a new book): "
if errorlevel 1 goto :EOF
if "%book%" equ "%i%" (
   set /P "book[%book%]=Enter name of new book: "
   echo !book[%book%]!þEnter new book contents here>> books.txt
)
if not defined book[%book%] echo No such book & goto endBook


rem Extract the book into an individual file and show it in Notepad
set "name=!book[%book%]!"
set /A skip=book-1
if %skip% gtr 0 (set "skip=skip=%skip%") else set "skip="
(for /F "%skip% tokens=1-31* delims=ÿ" %%@ in (books.txt) do call :ReadBook & goto continue) > "%name%.txt"
:continue
echo/
echo -^> Editing book: "%name%"
(
copy "%name%.txt" "%name%.bak"
notepad "%name%.txt" | pause
fc "%name%.txt" "%name%.bak"
set "errLevel=!errorlevel!"
del "%name%.bak"
) > NUL
if %errLevel% equ 0 goto endBook


rem Copy all books in books.txt file, update this one
< NUL (

   rem Process all lines in books.txt file, with 32 possible tokens each
   set "i=0"
   for /F "tokens=1-31* delims=ÿ" %%@ in (books.txt) do (
      set /A i+=1
      if !i! neq %book% (

         rem Copy up to 32 fields of other books
         set "tokens=@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_"
         set "field=x"
         for /L %%i in (1,1,32) do if defined field (
            call :ReadField field="!tokens:~0,1!"
            if defined field set /P "=!field!ÿ"
            set "tokens=!tokens:~1!"
         )
         echo/

      ) else (

         rem Save the modified book in it's original place
         ECHO Splitting file in fields . . . > CON
         set "field=þ!book[%book%]!"
         call :strLen field
         set /A "currentLen=len-1"
         SET "j=1"
         SET /P "=Field #!j!: !currentLen!, " > CON
         for /F "usebackq delims=" %%a in ("%name%.txt") do (
            set "fieldNew=þ%%a"
            call :strLen fieldNew
            set /A "newLen=currentLen+len"
            if !newLen! lss 8150 (
               set "field=!field!!fieldNew!"
               set /A "currentLen+=Len"
               SET /P "=!currentLen!, " > CON
            ) else (
               set /P "=!field:~1!ÿ"
               set "field=!fieldNew!"
               set /A "currentLen=len"
               ECHO/> CON
               ECHO/> CON
               SET /A j+=1
               SET /P "=Field #!j!: !currentLen!, " > CON
            )
         )
         echo(!field:~1!
         ECHO/> CON

      )
   )
) > books.tmp
(
del "%name%.txt"
move /Y books.tmp books.txt
) > NUL
echo/
echo Modified book stored in books.txt file

:endBook
echo/
pause
goto nextBook



:strLen strvar
set "str=0!%~1!"
set "len=0"
for /L %%a in (12,-1,0) do (
   set /A "newLen=len+(1<<%%a)"
   for %%b in (!newLen!) do if "!str:~%%b,1!" neq "" set "len=%%b"
)
exit /B


:ReadField field="token"
for %%. in (.) do set "%1=%%%~2"
exit /B


:ReadBook
setlocal EnableDelayedExpansion

set "tokens=@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_"
set "firstLine=1"

SET /P "=Reading book fields: " < NUL > CON
:nextField
   SET /P "=%tokens:~0,1%, " < NUL > CON
   for %%. in (.) do set "field=%%%tokens:~0,1%"
   if not defined field goto endFields

   :nextLine
      for /F "tokens=1* delims=þ" %%a in ("!field!") do (
         if not defined firstLine echo(%%a
         set "firstLine="
         set "field=%%b"
      )
   if defined field goto nextLine

   set "tokens=%tokens:~1%"
if defined tokens goto nextField
:endFields
ECHO/> CON
ECHO/> CON
exit /B

To start using this program, create the books.txt file with just a simple book. The name of the book is stored in the first line. For example.

books.txt

Code: Select all

Name of bookþFirst line of book.þSecond line of book.

This program is just a proof of concept; it lacks multiple details that are necessary to convert it in a fully working and robust application.

Antonio

Thor
Posts: 43
Joined: 31 Mar 2016 15:02

Re: Processing text files with very large lines via FOR /F command

#2 Post by Thor » 27 Mar 2017 11:20

Hi Aacini,

I have the following book called "Books.txt" (enclosed)
When I run your batch file which I call "book.bat", I've got the following menu:

Available books:

1- Book Name 1
2- Book Name 2
3- Book Name 3

Enter book number (4 to add a new book): 1
'/d' is not recognized as an internal or external command,
operable program or batch file.


Press any key to continue . . .

I've got the error as above do you know why?
Attachments
Books.zip
(339 Bytes) Downloaded 445 times

pieh-ejdsch
Posts: 240
Joined: 04 Mar 2014 11:14
Location: germany

Re: Processing text files with very large lines via FOR /F command

#3 Post by pieh-ejdsch » 27 Mar 2017 13:03

I think the idea is very good.
Like a sort of library.
To read the books correctly, the delayed expansion of variables must be deactivated. And in each row read and processed they are reactivated after the row has been set. Otherwise the exclamation marks will not be considered. All variables that are set inside the line to be processed must be reset with a variable transfer in an extension to be deactivated.
The lines beginning with semicolon are ignored and not read. They must be set to nothing with a line end character (EOL).
The empty lines are currently not considered, they must be treated separately in order to include them.
If an existing file is selected as a new book in the Existing Folder, this file must be read only and not deleted after the import.

On the whole the script is very good.
I am working on it.

Phil

pieh-ejdsch
Posts: 240
Joined: 04 Mar 2014 11:14
Location: germany

Re: Processing text files with very large lines via FOR /F command

#4 Post by pieh-ejdsch » 09 Apr 2017 13:02

Hello,

I have still eliminated a few absurdities.
Thus all empty lines are taken. The counting of the empty lines always takes place from the beginning or the last empty line at zero. Thus something like a compression of the original is achieved.
If a new field is created, it will get a mark at the start to take a space.
It can be that the original file does not end with a line return. Therefore it can occur a copy has plus two bytes more than the original.
A few functions I have done in macros

Code: Select all

@echo off

setlocal disableDelayedExpansion
call :setAllMacros

:nextBook

 rem Show available books and lets the user to select one
cls
echo/
echo Available books:
echo/
set i=1
set "inFolder= "
for /F "delims=þ" %%a in (books.txt
) do (
   set "getLine=%%a"
   setlocal enableDelayedExpansion
   if exist %%a.txt set "inFolder=*"
   echo  !inFolder! !i! - %%a
   
   %endlocal.Set( 4 3 2 1)tokens: 1= book[!i!]=!getLine!%
   set /A i+=1
)
echo/
echo/
set /P "book=Enter book number (%i% to add a new book): "
if errorlevel 1 goto :EOF
if "%book%" equ "%i%" (
   set isNewBook=1
   set /P "book[%book%]=Enter name of new book: "
   
   setlocal enabledelayedexpansion
   echo !book[%book%]!þ,þEnter new book contents here>> books.txt
   
   endlocal
) else set "isNewBook="

call set "name=%%book[%book%]%%"

if exist "%name%.txt" echo "%name%.txt" already exists! &goto endBook
 rem Extract the book into an individual file and show it in Notepad
set /A skip=book-1
if %skip% gtr 0 (set "skip=skip=%skip%") else set "skip="

call :ExtractBook

if :Sub == begin ( * begin :ExtractBook
 :ExtractBook
  4> "%name%.txt" (
    for /F "%skip% tokens=1-31* delims=ÿ" %%@ in (books.txt) do call :ReadBook & exit /b
  )
if :Sub == END * END :ExtractBook )

echo/
echo -^> Editing book: "%name%"
(
copy "%name%.txt" "%name%.bak"
notepad "%name%.txt" | >nul pause
fc "%name%.txt" "%name%.bak"
) >NUL
>nul (
del "%name%.bak"
if %errorLevel% equ 0 goto endBook
)
:: ---------------------------------------------- ^ OK

rem Copy all books in books.txt file, update this one
set "tokens= @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_"
< NUL (

    rem Process all lines in books.txt file, with 32 possible tokens each
   set "i=0"
   for /F "tokens=1-31* delims=ÿ" %%@ in (books.txt) do (
      set /A i+=1

      setlocal enableDelayedExpansion
      if !i! neq %book% ( endlocal
         
          rem Copy up to 32 fields of other books
         %copyFields%
         
       
      ) else (
         
          rem Save the modified book in it's original place
         ECHO Splitting file in fields . . . > CON
         set "allBlankLines=þ!book[%book%]!þ,"
         set "point="
         
          rem Save all empty lines in ordinal addition
         set "lastBlankLine=0"
         for /f "delims=:" %%E in ('findstr /n "^$" "%name%.txt"') do (
           set /a "nextBlankLine=%%E-lastBlankLine, lastBlankLine=%%E"
           set "allBlankLines=!allBlankLines!!nextBlankLine!,"
         )
         %strLen(var):var=!allBlankLines!%
         set /A "currentLen=len-1, j=1"
         SET /P "=Field #!j!: !currentLen!" > CON
         
         %endlocal.Set( 4 3 2 1)tokens: 3 2 1= j=!j! currentLen=!currentLen! field=!allBlankLines!%
         for /F usebackQdelims^=^ eol^= %%a in ("%name%.txt") do (
            set "fieldNew=þ%%a"
           
            setlocal enableDelayedExpansion
            %strLen(var):var=!fieldNew!%
            set /A "newLen=currentLen+len"
            if !newLen! lss 8150 (
               set "field=!field!!fieldNew!"
               set /A "currentLen+=Len"
               SET /P "=, !currentLen!" > CON
            ) else (
               if !j! gtr 1 set "point=."
               >&4 set /P "=!point!!field:~1!ÿ"
               set "field=!fieldNew!"
               set /A "currentLen=len+1"
               ECHO/> CON
               ECHO/> CON
               SET /A j+=1
               SET /P "=Field #!j!: !currentLen!" > CON
            )
           
            %endlocal.Set( 4 3 2 1)tokens: 3 2 1=  j=!j! currentLen=!currentLen! field=!field!%
         )
         
         setlocal enableDelayedExpansion
         if !j! gtr 1 set "point=."
         >&4 echo(!point!!field:~1!
         endlocal
         ECHO/> CON
      )
   )
) 4> books.tmp


(
del "%name%.txt"
move /Y books.tmp books.txt
) > NUL
echo/
echo Modified book stored in books.txt file

:endBook
echo/
pause
goto nextBook


:ReadBook
setlocal disableDelayedExpansion
set "allTokens=@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_"
set "firstLine="
set "secondLine="
set "order=-1"
set "allBlankLines=0"
set "nextBlankLine=0"
set "comma="

SET /P "=Reading book fields: " < NUL > CON
 :nextField
   for %%. in (.) do set "field=%%%allTokens:~0,1%"
   if not defined field goto endFields
   SET /P "=%comma%%allTokens:~0,1%" < NUL > CON
 
    :nextLine
     
      setlocal enableDelayedExpansion
      for /F tokens^=1*delims^=þ^ eol^= %%a in ("!field!") do (
         
         if NOT .==.!! setlocal enableDelayedExpansion
         if !nextBlankLine!==!order! (
           for %%e in (!allBlankLines!) do if !order!==%%e (
             set "allBlankLines=!allBlankLines:*,%%e,=,!"
             if not defined firstline ( set "allBlankLines=%%a"
             ) else ( set /a order=1
               >&4 echo(
             )
           ) else set /a order=0
           if defined firstLine set /a order=1
           if NOT !allBlankLines!==^, for /f "delims=," %%e in ("!allBlankLines!"
           ) do if NOT .%%e==. set /a "nextBlankLine=%%e"
           
           %endlocal.Set( 4 3 2 1)tokens: 3 2 1= order=!order! nextBlankLine=!nextBlankLine! allBlankLines=!allBlankLines!%
         )
         if .==.!! endlocal
         set "field=%%b"
         if defined firstLine if not defined comma ( >&4 echo(%%a
         ) else ( set "inLine=%%a"
           setLocal enableDelayedExpansion
           >&4 echo(!inLine:~1!
           endlocal
           set "comma="
         )
         if defined secondLine set /a firstLine=1
         set /a secondLine=1
         set /a Line+=1, order+=1
      )
      if .==.!! endlocal
   if defined field goto nextLine

   set "allTokens=%allTokens:~1%"
   set "comma=, "
if defined allTokens goto nextField
:endFields
for %%e in (%allBlankLines%) do >&4 echo(
ECHO/> CON
ECHO/> CON
exit /B

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:setAllMacros
:: define LF as a Line Feed (newline) character
set ^"LF=^

^" Above empty line is required - do not remove

:: define a newline with line continuation
set ^"\n=^^^%LF%%LF%^%LF%%LF%^^"

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:strLen.var
@for %%T in ("%temp%\%~n0.tmp.cmd") do @(
 @ >%%T (
  echo( @set strLen(var^)=(%%\n%%
  echo( set "str=Avar"%%\n%%
  echo( set "len=0"%%\n%%
  for /l %%i in (12 -1 0) do @(
   echo( set /a "len|=1<<%%i"%%\n%%
   echo( for %%%%# in (!len!^) do if .!str:~%%%%#^^^^^^,1!==. set /a "len&=~1<<%%i"%%\n%%
  )
  echo(^)
 )
 call %%T
 del %%T
)

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:copyFields
:: create a macro to
:: copy a valid token
set "allTokens= @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_"
for %%T in ("%temp%\%~n0.tmp.cmd") do (
  >%%T (
  echo(@set copyFields=^^^>^^^&4 (%%\n%%
  for /l %%i in (1 1 32) do (
   echo( if /i .%%%%^^^^^^^%%allTokens:~%%i,1%% neq . set /P "=%%%%%%allTokens:~%%i,1%%ÿ"%%\n%%
  )
  echo( echo(%%\n%%
  echo(^)
 )
 call %%T
 del %%T
)
set "allTokens="


::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:endlocal.Set
:: create a macro to
:: go out of delayed varialbe expansion
:: set the contents of 1-4 variables across the ENDLOCAL border
:: Variable 2-4 becomes delim=Space, Variable 1 has no delim
::  ---------------------------------------------------------
:: notice: -- the beginning          \SPACES/ are required --
::                                     v v 
:: usage: %endlocal.Set(4 3 2 1)tokens: 2 1= var!i!=!varName2! varName1=!varContent!%
::                                          ^                 ^
:: notice: --       \this/ .Delim repeat  /Here\    and     /HERE\  --
::                     v
if "%~1" == "" ( set ".delim= "
) else           set ".delim=%~1"

set endlocal.Set( 4 3 2 1)tokens=(%\n%
 for /f "tokens=1-3* delims=%.delim%" %%1 in (%\n%
  "%.delim%4%.delim%3%.delim%2%.delim%1"%\n%
 ) do (%\n%
  if .==.!! endlocal%\n%
  if NOT .%%1==.4 (%\n%
   set "%%~1"%\n%
  )%\n%
  if NOT .%%2==.3 (%\n%
   set "%%~2"%\n%
  )%\n%
  if NOT .%%3==.2 (%\n%
   set "%%~3"%\n%
  )%\n%
  if NOT :.%%4==:.1 (%\n%
   set "%%~4"%\n%
)))
set ".delim="
set "LF="
set "\n="
exit /B


Phil

Post Reply