How to determine the number of difference
Moderator: DosItHelp
-
- Posts: 5
- Joined: 03 Oct 2011 03:13
How to determine the number of difference
Hey
I need a batch script which compare two text files and determine the number of different ligne.
For exemple: file1.txt contains
aaaaaa
bbbbbb
cccccc
And file2.txt contains
aaaaaa
eeeeee
bbbbbb
cccccc
the result of the script must display the number of differences ::-> 1
Thanks for advance.
I need a batch script which compare two text files and determine the number of different ligne.
For exemple: file1.txt contains
aaaaaa
bbbbbb
cccccc
And file2.txt contains
aaaaaa
eeeeee
bbbbbb
cccccc
the result of the script must display the number of differences ::-> 1
Thanks for advance.
Re: How to determine the number of difference
'
It's easy, I use 'fc' command, there is also a 'comp' command although I don't really understand the difference:
It's easy, I use 'fc' command, there is also a 'comp' command although I don't really understand the difference:
Code: Select all
fc /?
comp /?
-
- Posts: 5
- Joined: 03 Oct 2011 03:13
Re: How to determine the number of difference
i know that is the purpose of comp or fc but i need the number of ligne
like the exemple that i mention previously
like the exemple that i mention previously
Re: How to determine the number of difference
That simple question is devilishly difficult to answer.
How many lines are different in the below example?
A
B
C
---------
B
C
A
Obviously the lines are all identical, but simply in a different order. Depending on your requirements, the answer could be 0, 1, 2 or 3.
I don't think you will find a ready built solution. If you really want to do this you will have to precisely define your rules and build it yourself. I doubt you want to attempt this with batch.
Dave Benham
How many lines are different in the below example?
A
B
C
---------
B
C
A
Obviously the lines are all identical, but simply in a different order. Depending on your requirements, the answer could be 0, 1, 2 or 3.
I don't think you will find a ready built solution. If you really want to do this you will have to precisely define your rules and build it yourself. I doubt you want to attempt this with batch.
Dave Benham
-
- Posts: 5
- Joined: 03 Oct 2011 03:13
Re: How to determine the number of difference
Thanks ,
the idea is to add each line from file1.txt to a variable then browse to find this variable in file2.txt
How done
the idea is to add each line from file1.txt to a variable then browse to find this variable in file2.txt
How done
Re: How to determine the number of difference
So what would you expect the result to be in my A|B|C example?
-
- Posts: 5
- Joined: 03 Oct 2011 03:13
Re: How to determine the number of difference
I came up with a solution:
here is the script:
Its role is to verify the existence of each line of a first file in a second. it shows the occurrence of each line of file1 into file2.
I still just enter this code:
its role is to increment "compt" every time it does not find a line from file1 in file2
But I can not introduce it into DO
I need an idea please
here is the script:
Code: Select all
for /F "tokens=*" %%a in ('type file1.txt') do ( find /c "%%a" file2.txt )
Its role is to verify the existence of each line of a first file in a second. it shows the occurrence of each line of file1 into file2.
I still just enter this code:
Code: Select all
if errorlevel 1 Compt+=1
its role is to increment "compt" every time it does not find a line from file1 in file2
But I can not introduce it into DO
I need an idea please
-
- Posts: 5
- Joined: 03 Oct 2011 03:13
Re: How to determine the number of difference
Finally I have the solution :
Enjoy IT !!!
Code: Select all
@echo off
for /F "tokens=*" %%a in ('type file1.txt') do (
find /c "%%a" file2.txt
if errorlevel 1 set /a Compt+=1
)
for /F "tokens=*" %%b in ('type file2.txt') do (
find /c "%%b" file1.txt
if errorlevel 1 set /a Compt+=1
)
echo le nombre de différence est %Compt%
PAUSE
Enjoy IT !!!
Re: How to determine the number of difference
Good - you figured out how to incorporate the failure test. There is a simpler way to do the same thing using the cmd && (success commands go here) || (failure commands go here) construct. You only need the failure portion, plus you don't care about the output so you can redirect it to nul, plus you don't need the /C option:
You still have problems.
1) Your code searches to see if a file1 line is anywhere within a line in file2. This means file1 line could be a substring of file2 line. But you want an exact match. This can easily be solved by switching to FINDSTR with /B /E /C:"%%a" options.
2) Your FOR loop currently skips blank lines as well as lines that begin with ; (implicit FOR "EOL=;" option). There are ways to get around this but its not worth it until you solve more critical problems.
3) What if identical lines appear multiple times within either or both files. The number of appearances should be the same for both, yes?
4) Does line order really not matter? These two files seem very different to me, yet your algorithm will treat them as having no differences:
File 1
Name=George
Age=32
Name=Fred
Age=21
File 2
Name=George
Age=21
Name=Fred
Age=32
5) Even if you solve the above issues - you still have a nasty judgement call to make:
Example 1
File 1
Name=George
Age=32
File 2
Name=George
Age=15
Most people would say there is one difference between the files.
Example 2
File 1
Name=George
Age=32
File 2
Name=George
Hobby=soccer
I think most people would expect the difference count to be two in this case. (file 1 is missing hobby, file 2 is missing age)
What algorithm will give the "correct" answer to both Example 1 and Example 2? This kind of problem is difficult enough, but to try to tackle it using batch programming seems like a bad idea.
Dave Benham
Code: Select all
for /F "tokens=*" %%a in ('type file1.txt') do (
find "%%a" file2.txt >nul || set /a Compt+=1
)
You still have problems.
1) Your code searches to see if a file1 line is anywhere within a line in file2. This means file1 line could be a substring of file2 line. But you want an exact match. This can easily be solved by switching to FINDSTR with /B /E /C:"%%a" options.
2) Your FOR loop currently skips blank lines as well as lines that begin with ; (implicit FOR "EOL=;" option). There are ways to get around this but its not worth it until you solve more critical problems.
3) What if identical lines appear multiple times within either or both files. The number of appearances should be the same for both, yes?
4) Does line order really not matter? These two files seem very different to me, yet your algorithm will treat them as having no differences:
File 1
Name=George
Age=32
Name=Fred
Age=21
File 2
Name=George
Age=21
Name=Fred
Age=32
5) Even if you solve the above issues - you still have a nasty judgement call to make:
Example 1
File 1
Name=George
Age=32
File 2
Name=George
Age=15
Most people would say there is one difference between the files.
Example 2
File 1
Name=George
Age=32
File 2
Name=George
Hobby=soccer
I think most people would expect the difference count to be two in this case. (file 1 is missing hobby, file 2 is missing age)
What algorithm will give the "correct" answer to both Example 1 and Example 2? This kind of problem is difficult enough, but to try to tackle it using batch programming seems like a bad idea.
Dave Benham
Last edited by dbenham on 03 Oct 2011 17:35, edited 1 time in total.
Re: How to determine the number of difference
Well Dave your first example shows 4 different lines because that can't be the same George in both files
I guess the problem can be solved using batch as well as using each other language but the rules have to be clearly defined. Without these rules we're groping in the dark.
Regards
aGerman
I guess the problem can be solved using batch as well as using each other language but the rules have to be clearly defined. Without these rules we're groping in the dark.
Regards
aGerman