Page 1 of 2
Match Up
Posted: 12 Jan 2015 22:31
by foncesa
Hi,
I want to match the lines of text file upto 45th character and if any duplicates are found then liken to be copied to a new text file.
Sample Text File
Title ABC & Co.
10366700005501800000e006000000000003148600000e006059327300000000000000011205920289592524711849252471180917724
036733861460000000000095695300003440760000e006000000000003953610000e00022641877060775336482700000000000232434
00500000000000000027349091590936527851800000000000695300003440760000e0060000000000095695300003440760000e00622
053936274630000000000097217198593495480428e006000000000000002264187706077533648270000000000074059119530430508
3953610000e00500000000000000015475893780529948335000000000000000226418770607753364827000000069530000344076000
00500000000000000027349091590936527851800000e0060593273000000000000000112059202895925247118492524711809177242
Duplicate found text file
00500000000000000027349091590936527851800000000000695300003440760000e0060000000000095695300003440760000e00622
00500000000000000027349091590936527851800000e0060593273000000000000000112059202895925247118492524711809177242
Re: Match Up
Posted: 13 Jan 2015 05:53
by foxidrive
You've shown two lines that match.
What will happen if there are multiple lines that match? Do you want every matching line in the file?
Code: Select all
00350000000000000027349091590936527851800000000000695300003440760000e0060000000000095695300003440760000e00622
00350000000000000027349091590936527851800000e0060593273000000000000000112059202895925247118492524711809177242
00500000000000000027349091590936527851800000000000695300003440760000e0060000000000095695300003440760000e00622
00500000000000000027349091590936527851800000e0060593273000000000000000112059202895925247118492524711809177242
00600000000000000027349091590936527851800000000000695300003440760000e0060000000000095695300003440760000e00622
00600000000000000027349091590936527851800000e0060593273000000000000000112059202895925247118492524711809177242
Re: Match Up
Posted: 13 Jan 2015 06:55
by foncesa
Hi,
Thanks foxidrive for response.
You've shown two lines that match.
What will happen if there are multiple lines that match? Do you want every matching line in the file?
Yes, I want all those matching mupltiple lines in the file.
Re: Match Up
Posted: 13 Jan 2015 07:10
by OperatorGK
Try this:
Code: Select all
@echo off
del matches.txt
del temp.txt
for /f "tokens=*" %%a in (file.txt) do (
2>&1 >nul findstr "%%a" temp.txt && echo:%%a >> matches.txt
echo:%%a >> temp.txt
)
del temp.txt
Replace file.txt with your filename.
Re: Match Up
Posted: 13 Jan 2015 07:31
by Squashman
OperatorGK wrote:Try this:
Code: Select all
@echo off
del matches.txt
del temp.txt
for /f "tokens=*" %%a in (file.txt) do (
2>&1 >nul findstr "%%a" temp.txt && echo:%%a >> matches.txt
echo:%%a >> temp.txt
)
del temp.txt
Replace file.txt with your filename.
Re-read the original question again. Your code is missing one critical piece.
Re: Match Up
Posted: 13 Jan 2015 07:36
by OperatorGK
Aw, sorry!
Code:
Code: Select all
@echo off
setlocal enabledelayedexpansion
del matches.txt
del temp.txt
for /f "tokens=*" %%a in (file.txt) do (
set str=%%a
2>&1 >nul findstr "%str:~0,45%" temp.txt && echo:%%a >> matches.txt
echo:%%a >> temp.txt
)
del temp.txt
This should work as expected.
Re: Match Up
Posted: 13 Jan 2015 07:48
by Squashman
OperatorGK wrote:Aw, sorry!
Code:
Code: Select all
@echo off
setlocal enabledelayedexpansion
del matches.txt
del temp.txt
for /f "tokens=*" %%a in (file.txt) do (
set str=%%a
2>&1 >nul findstr "%str:~0,45%" temp.txt && echo:%%a >> matches.txt
echo:%%a >> temp.txt
)
del temp.txt
This should work as expected.
If you change the search variable to use delayed expansion it might.
Re: Match Up
Posted: 13 Jan 2015 09:35
by Squashman
Actually I don't think that is going to work. The user wants both records that match in the match file.
If the 1st record in the file matches the 2nd record your code will only put the 2nd record into the match file.
Re: Match Up
Posted: 13 Jan 2015 10:06
by foncesa
Hi,
Thanks for the response.
Its true Squashman its not working, it does not generate the matches.txt(output file of duplicate lines), wherein duplicate lines are in file.txt
Re: Match Up
Posted: 13 Jan 2015 11:45
by Squashman
We have 3rd party software where I work that does this but it does require that you SORT the file on the positions you are matching. Which makes complete sense when you think about it. That would be the only way to hold the previous record in memory to see if the following records match it. If you had to hold all the previous records in memory you would probably get out of memory errors on large files.
Our software actually outputs two files.
1) Deduped file.
2) All duplicates. This includes the one record that went to the deduped file.
Re: Match Up
Posted: 13 Jan 2015 15:28
by Aacini
Code: Select all
@echo off
setlocal EnableDelayedExpansion
(for /F "delims=" %%a in (SampleTextFile.txt) do (
set "line=%%a"
for /F %%d in ("!line:~0,44!") do (
if defined dup[%%d] (
if "!dup[%%d]!" neq "0" (
echo %%d!dup[%%d]!
set /A "dup[%%d]=0"
)
echo %%a
) else (
set "dup[%%d]=!line:~44!"
)
)
)) > DuplicateFound.txt
Antonio
Re: Match Up
Posted: 13 Jan 2015 22:26
by foncesa
Hi,
Thanks Antonio its perfect.
Many Thanks.
Re: Match Up
Posted: 14 Jan 2015 07:30
by Squashman
Antonio,
Would that grow the environment with a lot of variables and possibly max out the environment?
How big of a file have you tested this on?
Re: Match Up
Posted: 14 Jan 2015 12:19
by Aacini
Squashman wrote:Antonio,
Would that grow the environment with a lot of variables and possibly max out the environment?
How big of a file have you tested this on?
The maximum size of the environment is 64 MB, so I don't think this be a problem here unless the files that the OP needs to process be very large. However, it is a fact that the process becomes slower as the size of environment grows, as extensively discussed in
this dbenham's thread. If the files are reasonably sized this solution should solve the problem in a reasonable time, but only the OP can answer this question.
@foncesa, could you post the size of the file and the time this solution takes to get the result? Do you think this is a reasonable solution? TIA
If would be necessary, the excessive time problem may be minimized up to a certain degree via the techniques I described in the several replies I posted in such thread. With an environment size of 1314669 the timing test written by Dave run in 60% less time when those techniques are used as show in
this reply, and the percentage of time gain will augment as the environment size increases.
Antonio
PS - The 64 MB max size for the environment is documented here, below "Setting environment variables":
http:
//www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/ntcmds_shelloverview.mspx?mfr=true
Re: Match Up
Posted: 14 Jan 2015 14:09
by catalinnc
this one is very light on variables...feel free to edit it to make it more readable...tested on win xp sp3 up to date...
Code: Select all
@echo off
setlocal enabledelayedexpansion
echo time is %time% for starting operation
type nul > "_duplicate found.txt"
set "_entry=0"
for /f "usebackq delims=" %%A in ("_sample text file.txt") do (
set /a "_entry+=1"
echo processing entry !_entry!
set "_string1=%%A"
set "_string1=!_string1:~0,44!"
set "_already_found=false"
for /f "usebackq delims=" %%B in ("_duplicate found.txt") do (
set "_string2=%%B"
set "_string2=!_string2:~0,44!"
if /i [!_string1!] equ [!_string2!] (set "_already_found=true")
)
set "_entry_skip=!_entry!"
set "_first_time=true"
for /f "usebackq delims=" %%C in ("_sample text file.txt") do (
if /i [!_already_found!] neq [true] (
set "_string3=%%C"
set "_string3=!_string3:~0,44!"
if /i [!_entry_skip!] equ [0] (
if /i [!_string1!] equ [!_string3!] (
if /i [!_first_time!] equ [true] (
(
echo(
echo(
echo %%A
echo %%C
)>> "_duplicate found.txt"
set "_first_time=false"
) else (
echo %%C >> "_duplicate found.txt"
)
)
) else (set /a "_entry_skip-=1")
)
)
)
echo time is %time% for ending operation
endlocal
pause