Page 1 of 1

Best way to pull out specifc lines from text file? JREPL?

Posted: 23 Sep 2021 03:47
by SIMMS7400
HI Folks -

I have the following text file that is an output from a data load process I run. I need to scan the file and if an instance of "| Error: " is found, I need to create another file with a header and applicable rows.

Here is my file which I have pulled out at the etra garbage to keep it short:

Code: Select all

2021-09-15 10:07:01,454 DEBUG [AIF]: Obtained connection to essbase cube: FinPlan
2021-09-15 10:07:01,496 DEBUG [AIF]: enableZeroLoad: 
2021-09-15 10:07:01,496 DEBUG [AIF]: SELECT '"'||ACCOUNT||'"','"'||ENTITY||'"','"'||UD1||'"','"'||UD2||'"','"'||UD3||'"','"'||UD4||'"','"'||UD5||'"','"'||SCENARIO||'"','"'||YEAR||'"','"'||PERIOD||'"',AMOUNT
2021-09-15 10:07:02,068 DEBUG [AIF]: Locked rule file: AIF0002
2021-09-15 10:07:02,078 INFO  [AIF]: Loading data into cube using data file...
2021-09-15 10:07:02,182 INFO  [AIF]: purge data file: /u03/inbox/outbox/FinINTEL_2185.dat
2021-09-15 10:07:02,191 INFO  [AIF]: Load data encountered the following errors:
| Error: 3303 | DP_0A10 | "AC_505110","DP_0A10","Final","BL_ALDPT","No Market","Load","PD_000000000","Actual","FY21","Aug",241.7 | 
| Error: 3303 | DP_0A10 | "AC_505120","DP_0A10","Final","BL_ALDPT","No Market","Load","PD_000000000","Actual","FY21","Aug",5254.76 | 
| Error: 3303 | DP_0A10 | "AC_505130","DP_0A10","Final","BL_ALDPT","No Market","Load","PD_000000000","Actual","FY21","Aug",130.19 | 
| Error: 3303 | DP_0A10 | "AC_505140","DP_0A10","Final","BL_ALDPT","No Market","Load","PD_000000000","Actual","FY21","Aug",819.02 | 
| Error: 3303 | DP_0A10 | "AC_505200","DP_0A10","Final","BL_ALDPT","No Market","Load","PD_000000000","Actual","FY21","Aug",270.09 | 
| Error: 3303 | DP_0A10 | "AC_505300","DP_0A10","Final","BL_ALDPT","No Market","Load","PD_000000000","Actual","FY21","Aug",-6.37 | 
| Error: 3303 | DP_0A10 | "AC_505400","DP_0A10","Final","BL_ALDPT","No Market","Load","PD_000000000","Actual","FY21","Aug",4.35 | 
As stated if "| Error: " is found (and I need to search that exact string as there are other instances with "Error:" that I don't want to include) I then need to find the line that contains SELECT '"'||ACCOUNT||'"' and then parse that line as my header and then parse the contents of the "| Error: " lines accordingly after the 3rd instance of PIPE:

I need my results as follows:

Code: Select all

ACCOUNT,ENTITY,UD1,UD2,UD3,UD4,UD5,SCENARIO,YEAR,PERIOD,AMOUNT
AC_505110,DP_0A10,Final,BL_ALDPT,No Market,Load,PD_000000000,Actual,FY21,Aug,241.7
AC_505120,DP_0A10,Final,BL_ALDPT,No Market,Load,PD_000000000,Actual,FY21,Aug,5254.76
AC_505130,DP_0A10,Final,BL_ALDPT,No Market,Load,PD_000000000,Actual,FY21,Aug,130.19
AC_505140,DP_0A10,Final,BL_ALDPT,No Market,Load,PD_000000000,Actual,FY21,Aug,819.02
AC_505200,DP_0A10,Final,BL_ALDPT,No Market,Load,PD_000000000,Actual,FY21,Aug,270.09
AC_505300,DP_0A10,Final,BL_ALDPT,No Market,Load,PD_000000000,Actual,FY21,Aug,-6.37
AC_505400,DP_0A10,Final,BL_ALDPT,No Market,Load,PD_000000000,Actual,FY21,Aug,4.35
Is that possible? I have this code to grab the applicable lines but I'm not sure if I should be parsing each piece separately or if I can spool to a temp file and parse all at the same time?

Code: Select all

FINDSTR /C:"||ACCOUNT||" /C:"| Error: " "DMLog.txt" >>output.txt

Re: Best way to pull out specifc lines from text file? JREPL?

Posted: 23 Sep 2021 09:24
by T3RRY
If using powershell commands via batch is an option, this becomes very trivial thanks to regex.

Code: Select all

:# https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_regular_expressions?view=powershell-7.1

@Echo off & CD /d "%~dp0"
 Set "File=%~dp0input.txt" %= Replace with actual filepath =%

 Del output.txt 2> nul

 For /f "delims=" %%v in ('where /r "%Windir%\SysWOW64\WindowsPowerShell" powershell.exe')Do Set "powershell=%%v"

 %__APPDIR__%Findstr.exe /Blic:"| Error:" "%File%" > nul && (
  "%Powershell%" -nologo -noprofile -command ^
   "($Contents = (GC "%File%")) -match '^.*?[|][|][A][C][C][O][U][N][T]' | ForEach{"$_" -replace '\""' -replace '^.*?[|][|][A][C][C]','ACC' -replace '[|]' -replace ''''}; "$Contents" -match '^[|] [E][r][r][o][r][:] \d+' | ForEach {"$_"  -replace '\""' -replace '^[|] [E][r][r][o][r][:]\s\d+\s[|]\s\w+[_][0-9a-z]+\s[|]\s' -replace '\s[|]\s',',' -replace '[,]$'}; out-file -filepath .\output.txt"
  type output.txt
 )

Goto:Eof


Forgive the excessively long powershell command.

for the sake of reviewing the line the same command escaped as a multiline command:

Code: Select all

 %__APPDIR__%Findstr.exe /Blic:"| Error:" "%File%" > nul && (
  "%Powershell%" -nologo -noprofile -command ^
   ($Contents = (GC "%File%"^)^) -match '^^.*?[^|][^|][A][C][C][O][U][N][T]' ^|^
     ForEach{"$_" -replace '\""' -replace '^^.*?[^|][^|][A][C][C]','ACC' -replace '[^|]' -replace ''''};^
    "$Contents" -match '^^[^|] [E][r][r][o][r][:] \d+' ^|^
     ForEach {"$_"  -replace '\""' -replace '^^[^|] [E][r][r][o][r][:]\s\d+\s[^|]\s\w+[_][0-9a-z]+\s[^|]\s' -replace '\s[^|]\s',',' -replace '[,]$'};^
     out-file -filepath .\output.txt
  type output.txt
 )

Re: Best way to pull out specifc lines from text file? JREPL?

Posted: 23 Sep 2021 09:48
by Aacini
This works:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

findstr /C:"| Error: " test.txt > temp1.txt
if errorlevel 1 goto :EOF

for /F "usebackq tokens=1* delims=|" %%a in (`findstr /R /C:"SELECT '.'||ACCOUNT||" test.txt`) do set "header=%%b"
set "header=!header:||'"','"'||=,!" & set "header=!header:||'"'=!"

(echo !header!
for /F "tokens=3 delims=|" %%a in (temp1.txt) do (
   set "line=%%a"
   set "line=!line:~1,-1!"
   echo !line:"=!
)) > output.txt
Antonio

Re: Best way to pull out specifc lines from text file? JREPL?

Posted: 23 Sep 2021 11:19
by SIMMS7400
Aacini - that worked like a charm!! Thank you!!

Re: Best way to pull out specifc lines from text file? JREPL?

Posted: 23 Sep 2021 21:48
by Aacini
When I read this phrase:
T3RRY wrote:
23 Sep 2021 09:24
If using powershell commands via batch is an option, this becomes very trivial thanks to regex.
I expected to see a "very trivial code" (compared vs a Batch file)...

However, when I saw this:
T3RRY wrote:
23 Sep 2021 09:24

Code: Select all

:# https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_regular_expressions?view=powershell-7.1

@Echo off & CD /d "%~dp0"
 Set "File=%~dp0input.txt" %= Replace with actual filepath =%

 Del output.txt 2> nul

 For /f "delims=" %%v in ('where /r "%Windir%\SysWOW64\WindowsPowerShell" powershell.exe')Do Set "powershell=%%v"

 %__APPDIR__%Findstr.exe /Blic:"| Error:" "%File%" > nul && (
  "%Powershell%" -nologo -noprofile -command ^
   "($Contents = (GC "%File%")) -match '^.*?[|][|][A][C][C][O][U][N][T]' | ForEach{"$_" -replace '\""' -replace '^.*?[|][|][A][C][C]','ACC' -replace '[|]' -replace ''''}; "$Contents" -match '^[|] [E][r][r][o][r][:] \d+' | ForEach {"$_"  -replace '\""' -replace '^[|] [E][r][r][o][r][:]\s\d+\s[|]\s\w+[_][0-9a-z]+\s[|]\s' -replace '\s[|]\s',',' -replace '[,]$'}; out-file -filepath .\output.txt"
  type output.txt
 )

Goto:Eof


Forgive the excessively long powershell command.

for the sake of reviewing the line the same command escaped as a multiline command:

Code: Select all

 %__APPDIR__%Findstr.exe /Blic:"| Error:" "%File%" > nul && (
  "%Powershell%" -nologo -noprofile -command ^
   ($Contents = (GC "%File%"^)^) -match '^^.*?[^|][^|][A][C][C][O][U][N][T]' ^|^
     ForEach{"$_" -replace '\""' -replace '^^.*?[^|][^|][A][C][C]','ACC' -replace '[^|]' -replace ''''};^
    "$Contents" -match '^^[^|] [E][r][r][o][r][:] \d+' ^|^
     ForEach {"$_"  -replace '\""' -replace '^^[^|] [E][r][r][o][r][:]\s\d+\s[^|]\s\w+[_][0-9a-z]+\s[^|]\s' -replace '\s[^|]\s',',' -replace '[,]$'};^
     out-file -filepath .\output.txt
  type output.txt
 )
I wonder "Where the trivial PS code is?" :shock:

IMHO in this type of answers the poster should warn: "This PS solution is more difficult and convoluted than a Batch file, but here it is anyway"... :?

Antonio

Re: Best way to pull out specifc lines from text file? JREPL?

Posted: 24 Sep 2021 04:10
by T3RRY
Aacini wrote:
23 Sep 2021 21:48

I wonder "Where the trivial PS code is?" :shock:

IMHO in this type of answers the poster should warn: "This PS solution is more difficult and convoluted than a Batch file, but here it is anyway"... :?

Antonio
Sorry Antonio, but I fail to see what's not trivial about using regex to match desired patterns and replace undesired substrings.
I readily accept the same result can be achieved more efficiently using batch alone for the given example data, however consider it worth presenting a regex solution that can easily be adapted to other data structures, considering the regex supported by findstr is very poor.

Just my view.

Re: Best way to pull out specifc lines from text file? JREPL?

Posted: 24 Sep 2021 06:25
by ShadowThief
Why did you put each regex character in its own set of class brackets? [A][C][C][O][ U][N][T] could have just been ACCOUNT, etc.

It's like you deliberately tried to make your code as unreadable as possible.