Page 1 of 1

Extract UNIQUE rows ONLY from .txt file.

Posted: 07 Jun 2020 10:08
by PAB
Good afternoon,

I hope you are all keeping well and safe!

I have a .txt file that gets produced with many lines of text, which could include many duplicates.
Here is the part of the code that creates the .txt file . . .

Code: Select all

type "%tmp%" | findstr /I /G:"%Filter%" >> "%Output_File%"
[1] I want to exclude duplicates.
[2] I want the original order kept [ excluding the duplictes of course! ].

I already had a bit of code in my collection which I have adapted to do the above, which it does, but is there anyway to incorporate it into my existing code above please instead of running it seperately?

Code: Select all

@echo off

set "InputFile=C:\Users\System-Admin\Desktop\Errors.txt"
set "OutputFile=C:\Users\System-Admin\Desktop\DISM_Errors2.txt"

set "PSScript=%Temp%\~tmpRemoveDupe.ps1"
if exist "%PSScript%" del /q /f "%PSScript%"
echo Get-Content "%InputFile%" ^| Get-Unique ^> "%OutputFile%" >> "%PSScript%"
set "PowerShellDir=C:\Windows\System32\WindowsPowerShell\v1.0"
cd /D "%PowerShellDir%"
Powershell -ExecutionPolicy Bypass -Command "& '%PSScript%'"
del "%PSScript%"
pause
goto :EOF
EOF
Thanks in advance.

Re: Extract UNIQUE rows ONLY from .txt file.

Posted: 07 Jun 2020 10:41
by Hackoo
Hi :)
You can try like that :

Code: Select all

@echo off
set "InputFile=C:\Users\System-Admin\Desktop\Errors.txt"
set "OutputFile=C:\Users\System-Admin\Desktop\DISM_Errors2.txt"
Call :RemoveDuplicateEntry %InputFile% %OutputFile%
Pause & Exit
::----------------------------------------------------
:RemoveDuplicateEntry <InputFile> <OutPutFile>
Powershell  ^
$Contents=Get-Content '%1';  ^
$LowerContents=$Contents.ToLower(^);  ^
$LowerContents ^| select -unique ^| Out-File '%2'
Exit /b
::----------------------------------------------------

Re: Extract UNIQUE rows ONLY from .txt file.

Posted: 07 Jun 2020 11:51
by PAB
Thanks for the reply, it is appreciated.

I have tried all different ways of getting this to work but I get at least one error on the PS side. One being . . .
Method invocation failed because [System.Object[]] doesn't contain a
method named 'ToLower'.
At line:1 char:97
+ $Contents=Get-Content 'C:\Users\System-Admin\Desktop\Dups.txt'; $Lo
werContents=$Contents.ToLower <<<< (); $LowerContents | select -uniqu
e | Out-File 'C:\Users\System-Admin\Desktop\DISM_Errors.txt'
+ CategoryInfo : InvalidOperation: (ToLower:String) [],
RuntimeException
+ FullyQualifiedErrorId : MethodNotFound

Code: Select all

) else (
  type "%tmp%" | findstr /I /G:"%Filter%" >> "%InputFile%"
  Call :RemoveDuplicateEntry %InputFile% %OutputFile%
  echo. & echo ^>Press ANY key to EXIT . . . & pause >nul
  goto :Exit
)
:Exit

:RemoveDuplicateEntry <InputFile> <OutputFile>
Powershell  ^
$Contents=Get-Content '%1';  ^
$LowerContents=$Contents.ToLower(^);  ^
$LowerContents ^| select -unique ^| Out-File '%2'
Exit /b
UPDATE:

It is important that the file is NOT sorted.
The .txt file is a log file that is sorted in yyyy-mm-dd hh-mm-secs, therefore, when there are duplicate rows, they are actually together anyway, pretty much as if they were already sorted.
The code I posted previously works great, except that I need this done within the same file rather than having another file perform this!

Thanks in advance.

Re: Extract UNIQUE rows ONLY from .txt file.

Posted: 07 Jun 2020 15:02
by PAB
OK, this actually works . . .

Code: Select all

) else (
  cls
  del "%Input_File%"
  type "%tmp%" | findstr /I /G:"%Filter%" >> "%Input_File%"
  del "%tmp%"
  echo. > "%Output_File%" & echo ERRORS FOUND . . . >> "%Output_File%" & echo. >> "%Output_File%"
  for /f "tokens=* delims= " %%a in (%Input_File%) do (
  find "%%a" < "%Output_File%" >nul || >> "%Output_File%" echo.%%a
  )
  del "%Input_File%"
  echo. & echo ^>Press ANY key to EXIT . . . & pause >nul
  goto :Exit
)
:Exit
One question please.

Throughout my code I have added speech marks around my path variables as I always do.
For some reason however, if I add speech marks around in ("%Input_File%") do, it returns a single line of the path and the file name.
Without the speech marks it returns the list of unique rows as expected!

Thanks in advance.

Re: Extract UNIQUE rows ONLY from .txt file.

Posted: 30 Jul 2020 16:43
by Squashman
PAB wrote:
07 Jun 2020 15:02
OK, this actually works . . .

Code: Select all

) else (
  cls
  del "%Input_File%"
  type "%tmp%" | findstr /I /G:"%Filter%" >> "%Input_File%"
  del "%tmp%"
  echo. > "%Output_File%" & echo ERRORS FOUND . . . >> "%Output_File%" & echo. >> "%Output_File%"
  for /f "tokens=* delims= " %%a in (%Input_File%) do (
  find "%%a" < "%Output_File%" >nul || >> "%Output_File%" echo.%%a
  )
  del "%Input_File%"
  echo. & echo ^>Press ANY key to EXIT . . . & pause >nul
  goto :Exit
)
:Exit
One question please.

Throughout my code I have added speech marks around my path variables as I always do.
For some reason however, if I add speech marks around in ("%Input_File%") do, it returns a single line of the path and the file name.
Without the speech marks it returns the list of unique rows as expected!

Thanks in advance.
Read the help file for the FOR command.

Code: Select all

usebackq        - specifies that the new semantics are in force,
                  where a back quoted string is executed as a
                  command and a single quoted string is a
                  literal string command and allows the use of
                  double quotes to quote file names in
                  file-set.
If you don't use this option the FOR command thinks the IN clause is a string when quotes are around it.