Script of interest (obtained from https://stackoverflow.com/questions/116 ... -text-file):
Code: Select all
@echo off
setlocal disableDelayedExpansion
set "file=%~1"
set "line=%file%.line"
set "deduped=%file%.deduped"
::Define a variable containing a linefeed character
set LF=^
::The 2 blank lines above are critical, do not remove
>"%deduped%" (
for /f usebackq^ eol^=^%LF%%LF%^ delims^= %%A in ("%file%") do (
set "ln=%%A"
setlocal enableDelayedExpansion
>"%line%" (echo !ln:\=\\!)
>nul findstr /xlg:"%line%" "%deduped%" || (echo !ln!)
endlocal
)
)
>nul move /y "%deduped%" "%file%"
2>nul del "%line%"
I am seeking for a modification to the above script to remove duplicates by checking only from the 8th character onwards; that is, to ignore the first 7 characters even if they are different. If there is no difference from the 8th character onwards to the end of line, the whole row is to be removed for repeated entries including the first 7 characters. The first 8 characters has the following format "0.00.00_" (without the quotes).
Example...
For an original text file with the following 5 entries:
0.10.01_ABC_X
0.10.04_DEFG_Y
0.10.01_ABC_X
0.10.02_DEFG_Y
1.11.03_PQRST_M
I will like the output to be:
0.10.01_ABC_X
0.10.04_DEFG_Y
1.11.03_PQRST_M
Thank you very much!