How to determine the number of difference

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
linconnue55
Posts: 5
Joined: 03 Oct 2011 03:13

How to determine the number of difference

#1 Post by linconnue55 » 03 Oct 2011 03:22

Hey

I need a batch script which compare two text files and determine the number of different ligne.

For exemple: file1.txt contains
aaaaaa
bbbbbb
cccccc
And file2.txt contains
aaaaaa
eeeeee
bbbbbb
cccccc

the result of the script must display the number of differences ::-> 1

Thanks for advance.

Ed Dyreen
Expert
Posts: 1569
Joined: 16 May 2011 08:21
Location: Flanders(Belgium)
Contact:

Re: How to determine the number of difference

#2 Post by Ed Dyreen » 03 Oct 2011 05:36

'
It's easy, I use 'fc' command, there is also a 'comp' command although I don't really understand the difference:

Code: Select all

fc /?
comp /?

linconnue55
Posts: 5
Joined: 03 Oct 2011 03:13

Re: How to determine the number of difference

#3 Post by linconnue55 » 03 Oct 2011 07:20

i know that is the purpose of comp or fc but i need the number of ligne

like the exemple that i mention previously :?:

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: How to determine the number of difference

#4 Post by dbenham » 03 Oct 2011 08:02

That simple question is devilishly difficult to answer.

How many lines are different in the below example?

A
B
C
---------
B
C
A

Obviously the lines are all identical, but simply in a different order. Depending on your requirements, the answer could be 0, 1, 2 or 3.

I don't think you will find a ready built solution. If you really want to do this you will have to precisely define your rules and build it yourself. I doubt you want to attempt this with batch.

Dave Benham

linconnue55
Posts: 5
Joined: 03 Oct 2011 03:13

Re: How to determine the number of difference

#5 Post by linconnue55 » 03 Oct 2011 08:26

Thanks ,

the idea is to add each line from file1.txt to a variable then browse to find this variable in file2.txt

How done

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: How to determine the number of difference

#6 Post by dbenham » 03 Oct 2011 09:10

So what would you expect the result to be in my A|B|C example?

linconnue55
Posts: 5
Joined: 03 Oct 2011 03:13

Re: How to determine the number of difference

#7 Post by linconnue55 » 03 Oct 2011 13:56

I came up with a solution:
here is the script:

Code: Select all

for /F "tokens=*" %%a in ('type file1.txt') do ( find /c "%%a" file2.txt )


Its role is to verify the existence of each line of a first file in a second. it shows the occurrence of each line of file1 into file2.

I still just enter this code:

Code: Select all

if errorlevel 1 Compt+=1

its role is to increment "compt" every time it does not find a line from file1 in file2

But I can not introduce it into DO

I need an idea please

linconnue55
Posts: 5
Joined: 03 Oct 2011 03:13

Re: How to determine the number of difference

#8 Post by linconnue55 » 03 Oct 2011 16:14

Finally I have the solution :

Code: Select all

@echo off

for /F "tokens=*" %%a in ('type file1.txt') do (
find /c "%%a" file2.txt
if errorlevel 1 set /a Compt+=1
)

for /F "tokens=*" %%b in ('type file2.txt') do (
find /c "%%b" file1.txt
if errorlevel 1 set /a Compt+=1
)
echo le nombre de différence est %Compt%

PAUSE


Enjoy IT !!! :mrgreen:

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: How to determine the number of difference

#9 Post by dbenham » 03 Oct 2011 16:48

Good - you figured out how to incorporate the failure test. There is a simpler way to do the same thing using the cmd && (success commands go here) || (failure commands go here) construct. You only need the failure portion, plus you don't care about the output so you can redirect it to nul, plus you don't need the /C option:

Code: Select all

for /F "tokens=*" %%a in ('type file1.txt') do (
  find "%%a" file2.txt >nul || set /a Compt+=1
)


You still have problems.

1) Your code searches to see if a file1 line is anywhere within a line in file2. This means file1 line could be a substring of file2 line. But you want an exact match. This can easily be solved by switching to FINDSTR with /B /E /C:"%%a" options.

2) Your FOR loop currently skips blank lines as well as lines that begin with ; (implicit FOR "EOL=;" option). There are ways to get around this but its not worth it until you solve more critical problems.

3) What if identical lines appear multiple times within either or both files. The number of appearances should be the same for both, yes?

4) Does line order really not matter? These two files seem very different to me, yet your algorithm will treat them as having no differences:
File 1
Name=George
Age=32
Name=Fred
Age=21

File 2
Name=George
Age=21
Name=Fred
Age=32

5) Even if you solve the above issues - you still have a nasty judgement call to make:

Example 1
File 1
Name=George
Age=32

File 2
Name=George
Age=15

Most people would say there is one difference between the files.

Example 2
File 1
Name=George
Age=32

File 2
Name=George
Hobby=soccer

I think most people would expect the difference count to be two in this case. (file 1 is missing hobby, file 2 is missing age)

What algorithm will give the "correct" answer to both Example 1 and Example 2? This kind of problem is difficult enough, but to try to tackle it using batch programming seems like a bad idea.

Dave Benham
Last edited by dbenham on 03 Oct 2011 17:35, edited 1 time in total.

aGerman
Expert
Posts: 4678
Joined: 22 Jan 2010 18:01
Location: Germany

Re: How to determine the number of difference

#10 Post by aGerman » 03 Oct 2011 17:35

Well Dave your first example shows 4 different lines because that can't be the same George in both files :lol:

I guess the problem can be solved using batch as well as using each other language but the rules have to be clearly defined. Without these rules we're groping in the dark.

Regards
aGerman

Post Reply