@rojo, your code works like a charm too. Thanks for a more efficient code.
Plasma33
Batchscript to extract texts from multiple lines
Moderator: DosItHelp
Re: Batchscript to extract texts from multiple lines
I completed a pure Batch file solution for this problem comprised of two parts: the first part split the original file with 3 lines into three files with one line each. The first and third files have the same original long lines, but the second file have the second long line split in shorter (1023 bytes) lines in order to be processed via a FOR /F command, so it match the same number of bytes that will be read from first and third files via SET /P commands.
The split method used is explained in this topic. In this case the detection of the end of the original lines is made when a SET /P command read less than 1023 bytes; this means that this method will fail if the original lines have a length multiple of 1023.
The second part process the three files created by first part and generate the desired output:
I tested this solution with the 2.86 MB input file and the output was generated correctly. Of course, this method is much slower than the C#, JScript or PowerShell ones, but it may be modified in a simpler way, that is, you just need to know Batch file programming in order to do so...
Antonio
Code: Select all
@echo off
setlocal EnableDelayedExpansion
rem Split a file with 3 very long lines into 3 files:
rem the first and third ones with the original lines 1 and 3 of input file
rem and the second one with shorter lines from line 2
rem http://www.dostips.com/forum/viewtopic.php?f=3&t=4945
echo Processing file, please wait...
for /F %%a in ('copy /Z "%~F0" NUL') do set "CR=%%a"
for /L %%i in (1,1,3) do del input%%i.txt 2> nul
set "in=1"
call :SplitLines < input.txt
goto :EOF
:SplitLines
echo/
echo Reading input line # %in%
set "lineNum=0"
:loopLine
set /P "line="
>> input%in%.txt set /P "=%line%" < NUL
if %in% equ 2 >> input%in%.txt echo/
set /A "lineNum+=1"
set /P "=Output line: %lineNum%!CR!" < NUL
if "%line:~1022%" neq "" goto loopLine
echo/
set /A in+=1
if %in% leq 3 goto SplitLines
exit /B
The split method used is explained in this topic. In this case the detection of the end of the original lines is made when a SET /P command read less than 1023 bytes; this means that this method will fail if the original lines have a length multiple of 1023.
The second part process the three files created by first part and generate the desired output:
Code: Select all
@echo off
setlocal EnableDelayedExpansion
rem Process input2.txt as base template and input1/input3 as additional input files,
rem merge the three files and generate output file
echo Processing files, please wait...
for /F %%a in ('copy /Z "%~F0" NUL') do set "CR=%%a"
for /L %%i in (1,1,3) do set "prev%%i="
set /A prevLen=0, line=0
echo Start: %TIME%
(for /F "delims=" %%a in (input2.txt) do (
rem Read the lines and fix broken alignment between lines
set /P "ln1=" & set "ln2=%%a" & set /P "ln3=" <&3
for /L %%i in (1,1,3) do (
set "ln%%i=.!prev%%i!!ln%%i!."
set "prev%%i="
)
set /A totLen=1023+prevLen, prevLen=0, line+=1
set /P "=Input line: !line!!CR!" < NUL > CON
rem Generate output accordingly to template in input2
for /L %%i in (0,1,!totLen!) do (
if "!ln2:~%%i,2!" equ ".|" (
set /A beg=%%i+1
) else if "!ln2:~%%i,2!" equ "|." (
set /A len=%%i-beg+1
if %%i lss !totLen! (
for %%m in ("!beg!,!len!") do for /L %%j in (1,1,3) do echo !ln%%j:~%%~m!
echo/
) else (
rem Possible alignment broken at end of line
for %%m in ("!beg!,!len!") do for /L %%j in (1,1,3) do set "prev%%j=!ln%%j:~%%~m!"
set "prevLen=!len!"
)
)
)
)) < input1.txt 3<input3.txt > output.txt
echo End: %TIME%
I tested this solution with the 2.86 MB input file and the output was generated correctly. Of course, this method is much slower than the C#, JScript or PowerShell ones, but it may be modified in a simpler way, that is, you just need to know Batch file programming in order to do so...
Antonio
Re: Batchscript to extract texts from multiple lines
There are two minor bugs in your code.
The first one is in your splitting batch, and only only occurs with specific line lengthes, for example 1022 characters (if you use "\r\n" as the endl marker):
If the newline character ('\n') ends up on the 1024th character of the "set/p"-input-buffer, then your algorithm will correctly notice the end of the line,
but the next "set/p" will read the newline character and the next line data ("\n|||||..." in the following example), producing a "\r\n" as the first characters in the "input2.txt".
Sample file that provokes this issue on line 2:
This issue should also be possible for line 3 (with a much bigger sample file).
The second issue is that leading spaces are ignored by set/p, so you might lose some spaces in "input2.txt":
penpen
The first one is in your splitting batch, and only only occurs with specific line lengthes, for example 1022 characters (if you use "\r\n" as the endl marker):
If the newline character ('\n') ends up on the 1024th character of the "set/p"-input-buffer, then your algorithm will correctly notice the end of the line,
but the next "set/p" will read the newline character and the next line data ("\n|||||..." in the following example), producing a "\r\n" as the first characters in the "input2.txt".
Sample file that provokes this issue on line 2:
Code: Select all
CCC.............1022.......C
|||.............1022.......|
CCC.............1022.......C
The second issue is that leading spaces are ignored by set/p, so you might lose some spaces in "input2.txt":
Code: Select all
Z:\<nul set /P "= 1 " & echo(#
1 #
penpen
Re: Batchscript to extract texts from multiple lines
Thanks guys!!
Plasma33
Plasma33