Remove Duplicate Files and Move them in the Bin.

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Remove Duplicate Files and Move them in the Bin.

#16 Post by foxidrive » 19 Jun 2015 04:37

If you have one file called runme.bat that contains

Code: Select all

@echo off
echo 1234


and another file called runme.bat that contains

Code: Select all

@echo off
echo 1245


Then your comparison will consider them both the same, but only because the name and filesize is the same.

What is inside those two files can be totally different.

mingolito
Posts: 28
Joined: 04 Dec 2014 11:34

Re: Remove Duplicate Files and Move them in the Bin.

#17 Post by mingolito » 19 Jun 2015 05:37

foxidrive wrote:If you have one file called runme.bat that contains

Code: Select all

@echo off
echo 1234


and another file called runme.bat that contains

Code: Select all

@echo off
echo 1245


Then your comparison will consider them both the same, but only because the name and filesize is the same.

What is inside those two files can be totally different.


OkK...Then you should also compare the file signature "hash" to determine if a file contained to a different folder of the same name or size, is equal to 100% to another file in another folder.. :mrgreen:

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Remove Duplicate Files and Move them in the Bin.

#18 Post by foxidrive » 19 Jun 2015 07:33

mingolito wrote:OkK...Then you should also compare the file signature "hash" to determine if a file contained to a different folder of the same name or size, is equal to 100% to another file in another folder.. :mrgreen:


Yes. You need a third party tool or look at carlos' post in recent weeks for his native batch script to compare files.

viewtopic.php?f=3&t=6439

mingolito
Posts: 28
Joined: 04 Dec 2014 11:34

Re: Remove Duplicate Files and Move them in the Bin.

#19 Post by mingolito » 19 Jun 2015 08:56

foxidrive wrote:
mingolito wrote:OkK...Then you should also compare the file signature "hash" to determine if a file contained to a different folder of the same name or size, is equal to 100% to another file in another folder.. :mrgreen:


Yes. You need a third party tool or look at carlos' post in recent weeks for his native batch script to compare files.

viewtopic.php?f=3&t=6439


Great job to carlos :) i did not understand but it only works with files in pdf. :?:

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Remove Duplicate Files and Move them in the Bin.

#20 Post by dbenham » 19 Jun 2015 08:59

If "hash" or checksum were precomputed and available, then that would be a good test. But Windows does not give you that. So the simplest thing to do is compare the files with FC. But don't use FC unless name and size are identical. I would use something like the following to test if files with same name and size are truely identical:

Code: Select all

fc /t /lb1 file1 file2 2>nul && echo files are identical

But even if the files are identical, there could be a good reason for both to exist.

Sure, there may be situations where you want to delete true duplicates. But I would never recommend that anyone blindly delete duplicates as you propose.


Dave Benham

mingolito
Posts: 28
Joined: 04 Dec 2014 11:34

Re: Remove Duplicate Files and Move them in the Bin.

#21 Post by mingolito » 19 Jun 2015 09:16

But I would never recommend that anyone blindly delete duplicates as you propose.

Oh yep, in fact thanks to the work of @foxidrive we were able to create a backup folder, to prevent permanent loss of files.
Prevention is Better than Cure... :wink:

npocmaka_
Posts: 514
Joined: 24 Jun 2013 17:10
Location: Bulgaria
Contact:

Re: Remove Duplicate Files and Move them in the Bin.

#22 Post by npocmaka_ » 19 Jun 2015 09:39

a simple way to generate check sum/hash code with certutil:

Code: Select all

for /f "skip=1 delims=" %%# in ('CertUtil -hashfile "C:\somefile" MD5^|find /v "CertUtil: -hashfile command completed successfully."') do set "haschcode=%%#"
echo %haschcode: =%



you can use following hash algorithms MD2 MD4 MD5 SHA1 SHA256 SHA384 SHA512 (with the uppercase) .
have on mind that certutil is not installed by default in win2003/xp

MAKECAB also provides its own checksum algorithm -this script can be used directly.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Remove Duplicate Files and Move them in the Bin.

#23 Post by dbenham » 19 Jun 2015 09:48

But is that faster than FC? I suppose it could be given that the CERTUTIL technique need never read a file more than once, whereas FC must always read the root file for each comparison made. But I don't know...

FC with /LB1 can abort the comparison as soon as it finds a difference.


Dave Benham

Aacini
Expert
Posts: 1910
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Remove Duplicate Files and Move them in the Bin.

#24 Post by Aacini » 19 Jun 2015 10:55

dbenham wrote:FC with /LB1 can abort the comparison as soon as it finds a difference.

Dave Benham

Hum, err... I am afraid not...

The FC's /LBn switch is used to group the number of consecutive different lines in just one reported group. As I said when I posted my FComp.bat program:

Aacini wrote:+ |You may use /1 FC switch for a finer isolating of mismatched sections
+ |(the default is /2). This way, two deleted sections separated by 2
+ |lines (instead of 3 by default) will be reported as two deleted
+ |sections instead of one large updated section, for example. However,
+ |in this case several updated sections separated by just one line will
+ |be reported separately with the same ending-beginning lines instead of
+ |as just one large updated section. You may tune up this parameter to
+ |fit your needs.

Antonio

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Remove Duplicate Files and Move them in the Bin.

#25 Post by dbenham » 19 Jun 2015 11:57

Aacini wrote:
dbenham wrote:FC with /LB1 can abort the comparison as soon as it finds a difference.

Dave Benham

Hum, err... I am afraid not...

The FC's /LBn switch is used to group the number of consecutive different lines in just one reported group. As I said when I posted my FComp.bat program:

Aacini wrote:+ |You may use /1 FC switch for a finer isolating of mismatched sections
+ |(the default is /2). This way, two deleted sections separated by 2
+ |lines (instead of 3 by default) will be reported as two deleted
+ |sections instead of one large updated section, for example. However,
+ |in this case several updated sections separated by just one line will
+ |be reported separately with the same ending-beginning lines instead of
+ |as just one large updated section. You may tune up this parameter to
+ |fit your needs.

Antonio

It is not as simple as I layed out, but it does make it more likely to abort when a difference is found. I have a 9.9 mb file test1.test and another nearly identical file test2.test except it has one extra line of "a" at the beginning and end. Note the difference in results and timing with/without the /LB1 option:

Code: Select all

D:\test>echo %time%&fc /t test1.test test2.test&call echo %^time%
13:47:35.40
Comparing files test1.test and TEST2.TEST
***** test1.test
ID3♥

***** TEST2.TEST
a
ID3♥

*****

***** test1.test
***** TEST2.TEST
a
*****

13:47:35.57

D:\test>echo %time%&fc /t /lb1 test1.test test2.test&call echo %^time%
13:47:52.33
Comparing files test1.test and TEST2.TEST
Resync Failed.  Files are too different.
***** test1.test
ID3♥
***** TEST2.TEST
a
*****

13:47:52.34


Dave Benham

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Remove Duplicate Files and Move them in the Bin.

#26 Post by foxidrive » 19 Jun 2015 23:39

That old gettimestamp.bat makes it so much easier to evaluate the elapsed time! ;)

Mind you the time taken in your tests is not very long at all.

Post Reply