Match Up

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
foncesa
Posts: 42
Joined: 12 Sep 2013 00:09

Match Up

#1 Post by foncesa » 12 Jan 2015 22:31

Hi,

I want to match the lines of text file upto 45th character and if any duplicates are found then liken to be copied to a new text file.

Sample Text File
Title ABC & Co.
10366700005501800000e006000000000003148600000e006059327300000000000000011205920289592524711849252471180917724
036733861460000000000095695300003440760000e006000000000003953610000e00022641877060775336482700000000000232434
00500000000000000027349091590936527851800000000000695300003440760000e0060000000000095695300003440760000e00622
053936274630000000000097217198593495480428e006000000000000002264187706077533648270000000000074059119530430508
3953610000e00500000000000000015475893780529948335000000000000000226418770607753364827000000069530000344076000
00500000000000000027349091590936527851800000e0060593273000000000000000112059202895925247118492524711809177242


Duplicate found text file
00500000000000000027349091590936527851800000000000695300003440760000e0060000000000095695300003440760000e00622
00500000000000000027349091590936527851800000e0060593273000000000000000112059202895925247118492524711809177242

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Match Up

#2 Post by foxidrive » 13 Jan 2015 05:53

You've shown two lines that match.

What will happen if there are multiple lines that match? Do you want every matching line in the file?

Code: Select all

00350000000000000027349091590936527851800000000000695300003440760000e0060000000000095695300003440760000e00622
00350000000000000027349091590936527851800000e0060593273000000000000000112059202895925247118492524711809177242
00500000000000000027349091590936527851800000000000695300003440760000e0060000000000095695300003440760000e00622
00500000000000000027349091590936527851800000e0060593273000000000000000112059202895925247118492524711809177242
00600000000000000027349091590936527851800000000000695300003440760000e0060000000000095695300003440760000e00622
00600000000000000027349091590936527851800000e0060593273000000000000000112059202895925247118492524711809177242

foncesa
Posts: 42
Joined: 12 Sep 2013 00:09

Re: Match Up

#3 Post by foncesa » 13 Jan 2015 06:55

Hi,

Thanks foxidrive for response.
You've shown two lines that match.
What will happen if there are multiple lines that match? Do you want every matching line in the file?


Yes, I want all those matching mupltiple lines in the file.

OperatorGK
Posts: 66
Joined: 13 Jan 2015 06:55

Re: Match Up

#4 Post by OperatorGK » 13 Jan 2015 07:10

Try this:

Code: Select all

@echo off
del matches.txt
del temp.txt
for /f "tokens=*" %%a in (file.txt) do (
  2>&1 >nul findstr "%%a" temp.txt && echo:%%a >> matches.txt
  echo:%%a >> temp.txt
)
del temp.txt

Replace file.txt with your filename.

Squashman
Expert
Posts: 4486
Joined: 23 Dec 2011 13:59

Re: Match Up

#5 Post by Squashman » 13 Jan 2015 07:31

OperatorGK wrote:Try this:

Code: Select all

@echo off
del matches.txt
del temp.txt
for /f "tokens=*" %%a in (file.txt) do (
  2>&1 >nul findstr "%%a" temp.txt && echo:%%a >> matches.txt
  echo:%%a >> temp.txt
)
del temp.txt

Replace file.txt with your filename.

Re-read the original question again. Your code is missing one critical piece.

OperatorGK
Posts: 66
Joined: 13 Jan 2015 06:55

Re: Match Up

#6 Post by OperatorGK » 13 Jan 2015 07:36

Aw, sorry!
Code:

Code: Select all

@echo off
setlocal enabledelayedexpansion
del matches.txt
del temp.txt
for /f "tokens=*" %%a in (file.txt) do (
  set str=%%a
  2>&1 >nul findstr "%str:~0,45%" temp.txt && echo:%%a >> matches.txt
  echo:%%a >> temp.txt
)
del temp.txt

This should work as expected.

Squashman
Expert
Posts: 4486
Joined: 23 Dec 2011 13:59

Re: Match Up

#7 Post by Squashman » 13 Jan 2015 07:48

OperatorGK wrote:Aw, sorry!
Code:

Code: Select all

@echo off
setlocal enabledelayedexpansion
del matches.txt
del temp.txt
for /f "tokens=*" %%a in (file.txt) do (
  set str=%%a
  2>&1 >nul findstr "%str:~0,45%" temp.txt && echo:%%a >> matches.txt
  echo:%%a >> temp.txt
)
del temp.txt

This should work as expected.

If you change the search variable to use delayed expansion it might.

Squashman
Expert
Posts: 4486
Joined: 23 Dec 2011 13:59

Re: Match Up

#8 Post by Squashman » 13 Jan 2015 09:35

Actually I don't think that is going to work. The user wants both records that match in the match file.
If the 1st record in the file matches the 2nd record your code will only put the 2nd record into the match file.

foncesa
Posts: 42
Joined: 12 Sep 2013 00:09

Re: Match Up

#9 Post by foncesa » 13 Jan 2015 10:06

Hi,

Thanks for the response.

Its true Squashman its not working, it does not generate the matches.txt(output file of duplicate lines), wherein duplicate lines are in file.txt

Squashman
Expert
Posts: 4486
Joined: 23 Dec 2011 13:59

Re: Match Up

#10 Post by Squashman » 13 Jan 2015 11:45

We have 3rd party software where I work that does this but it does require that you SORT the file on the positions you are matching. Which makes complete sense when you think about it. That would be the only way to hold the previous record in memory to see if the following records match it. If you had to hold all the previous records in memory you would probably get out of memory errors on large files.

Our software actually outputs two files.
1) Deduped file.
2) All duplicates. This includes the one record that went to the deduped file.

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Match Up

#11 Post by Aacini » 13 Jan 2015 15:28

Code: Select all

@echo off
setlocal EnableDelayedExpansion

(for /F "delims=" %%a in (SampleTextFile.txt) do (
   set "line=%%a"
   for /F %%d in ("!line:~0,44!") do (
      if defined dup[%%d] (
         if "!dup[%%d]!" neq "0" (
            echo %%d!dup[%%d]!
            set /A "dup[%%d]=0"
         )
         echo %%a
      ) else (
         set "dup[%%d]=!line:~44!"
      )
   )
)) > DuplicateFound.txt


Antonio

foncesa
Posts: 42
Joined: 12 Sep 2013 00:09

Re: Match Up

#12 Post by foncesa » 13 Jan 2015 22:26

Hi,

Thanks Antonio its perfect.

Many Thanks.

Squashman
Expert
Posts: 4486
Joined: 23 Dec 2011 13:59

Re: Match Up

#13 Post by Squashman » 14 Jan 2015 07:30

Antonio,
Would that grow the environment with a lot of variables and possibly max out the environment?
How big of a file have you tested this on?

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Match Up

#14 Post by Aacini » 14 Jan 2015 12:19

Squashman wrote:Antonio,
Would that grow the environment with a lot of variables and possibly max out the environment?
How big of a file have you tested this on?


The maximum size of the environment is 64 MB, so I don't think this be a problem here unless the files that the OP needs to process be very large. However, it is a fact that the process becomes slower as the size of environment grows, as extensively discussed in this dbenham's thread. If the files are reasonably sized this solution should solve the problem in a reasonable time, but only the OP can answer this question.

@foncesa, could you post the size of the file and the time this solution takes to get the result? Do you think this is a reasonable solution? TIA

If would be necessary, the excessive time problem may be minimized up to a certain degree via the techniques I described in the several replies I posted in such thread. With an environment size of 1314669 the timing test written by Dave run in 60% less time when those techniques are used as show in this reply, and the percentage of time gain will augment as the environment size increases.

Antonio

PS - The 64 MB max size for the environment is documented here, below "Setting environment variables":
http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/ntcmds_shelloverview.mspx?mfr=true

catalinnc
Posts: 39
Joined: 12 Jan 2015 11:56

Re: Match Up

#15 Post by catalinnc » 14 Jan 2015 14:09

this one is very light on variables...feel free to edit it to make it more readable...tested on win xp sp3 up to date...

Code: Select all

@echo off

setlocal enabledelayedexpansion

echo time is %time% for starting operation

type nul > "_duplicate found.txt"

set "_entry=0"

for /f "usebackq delims=" %%A in ("_sample text file.txt") do (

set /a "_entry+=1"

echo processing entry !_entry!

set "_string1=%%A"
set "_string1=!_string1:~0,44!"

set "_already_found=false"

for /f "usebackq delims=" %%B in ("_duplicate found.txt") do (

set "_string2=%%B"
set "_string2=!_string2:~0,44!"

if /i [!_string1!] equ [!_string2!] (set "_already_found=true")

)

set "_entry_skip=!_entry!"
set "_first_time=true"

for /f "usebackq delims=" %%C in ("_sample text file.txt") do (

if /i [!_already_found!] neq [true] (

set "_string3=%%C"
set "_string3=!_string3:~0,44!"

if /i [!_entry_skip!] equ [0] (

if /i [!_string1!] equ [!_string3!] (

if /i [!_first_time!] equ [true] (

(
echo(
echo(
echo %%A
echo %%C
)>> "_duplicate found.txt"

set "_first_time=false"

) else (

echo %%C >> "_duplicate found.txt"

)

)

) else (set /a "_entry_skip-=1")

)

)

)

echo time is %time% for ending operation

endlocal

pause


Post Reply