Page 1 of 1

[Solved] hELP

Posted: 21 Oct 2019 18:23
by sajjansinghania
I required a batch file to compare duplicate text files. DosTipss had helped me. After that, files have grown to over 800 but Limit of batch file seems up to 713:
https://i.imgur.com/2915yCb.png
There are some files generated of different size which it does not compare.
Kindly help.
-----------------------------------------------------------------------------
@Echo Off
SetLocal
Set Prompt=$g$s
Rem I move Versioned Backups to GOOD and the same to the bucket
Rem here: control on duplicate backup files
:: --------------------------------------------------------------------------
Rem Folder
PushD E:\12
:: --------------------------------------------------------------------------
Set Doubles=TRASH
Set Versions=GOOD
Echo Off
For %%i In (%Doubles% %Versions%) Do If Not Exist "%%i\" ( Md "%%i"
If ErrorLevel 1 Exit /B 1
)
For /F "Delims==" %%i In ('2^>Nul Set _') Do @ Set "%%i="
Set /A CountFiles=CountTrash=0
Rem Only files of the same size are compared
Rem Same files (errorlevel = 0) will be marked / moved / removed
Rem System Applicable to (compare / create) backup
For %%i In (*) Do ( SetLocal EnableDelayedExpansion
For /F "UseBackQTokens=1,2*" %%a In ('!CountFiles! !CountTrash! !_%%~zi!') Do (
EndLocal
Set /a CountFiles+=1
Set "EX="
For %%d In ( %%c ) Do If Not Defined EX (
>nul Fc "%%i" "%%~d" && (
Echo Weg %%i
>Nul Move "%%i" TRASH
Set /a CountTrash+=1, CountFiles-=1
Set /a Ex=1
)
)
If Not Defined Ex Set _%%~zi= %%c "%%i"
Title Files %%a Trash %%b
)
)
Title Files %CountFiles% Trash %CountTrash% Done
For /F "Delims==" %%i In ('2^>Nul Set _') Do @ Set "%%i="
Pause
Exit /B

Re: hELP

Posted: 22 Oct 2019 05:56
by ShadowThief
Get rid of that second setlocal enabledelayedexpansion and following endlocal; you don't need them and that's what's breaking the script

Re: hELP

Posted: 22 Oct 2019 20:56
by pieh-ejdsch
hallo,

The variable, _filesize is too long all the same size filenames are entered in one for each size. This means there are very many (600) files of the same size that contain different content. The length of the name is about 6 characters. I would design the script to create a list per file size to compare to. However, if so many files are the same size it is no longer useful to go through this list. All files from the list are checked against the new file. Is there another criterion on the name of the file, which already shows the difference of the content from the outset?

Re: hELP

Posted: 29 Oct 2019 04:59
by sajjansinghania
ShadowThief wrote:
22 Oct 2019 05:56
Get rid of that second setlocal enabledelayedexpansion and following endlocal; you don't need them and that's what's breaking the script
pieh-ejdsch wrote:
22 Oct 2019 20:56
hallo,
The variable, _filesize is too long all the same size filenames are entered in one for each size. This means there are very many (600) files of the same size that contain different content. The length of the name is about 6 characters. I would design the script to create a list per file size to compare to. However, if so many files are the same size it is no longer useful to go through this list. All files from the list are checked against the new file. Is there another criterion on the name of the file, which already shows the difference of the content from the outset?
Sir, I have no knowledge of coding batch files. will be grateful if a code from which i create a batch file be posted here.
Regards.

Re: hELP

Posted: 03 Nov 2019 11:20
by pieh-ejdsch
Please can you adjust your problem in the contribution description? "Help" is not helpful ...

I'm not sure if this script meets your requirements.
If you want to compare such a large number of files of the same size, you will make much better progress with the file hash.
According to their thread from january, they wanted to identify newer versions of one / more files.
Therefore, I asked if they can use the file name to read out how to version the file.

In any case, I hunted this script over more than 2.4 million files (all together are 18.5 MB) with about 15 thousand had a different content.
After one day (24h)!!!, 600,000 duplicates were sorted out.
With an optional quick'n'dirty function, the script did a bit faster.
But this does not need to be used.

Code: Select all

@echo off
setlocal
set prompt=$g$s
 rem ich verschiebe Versionierte Backups in GOOD und gleiche in den Eimer
 rem  hier: Kontrolle auf doppelte Backup Dateien
:: --------------------------------------------------------------------------
 rem Folder 

pushD "%~1"
:: --------------------------------------------------------------------------
set doubles=TRASH
set versions=GOOD
for %%i in (%doubles% %versions%) do if not exist "%%i\" ( md "%%i"
  if errorlevel 1 exit /b 1
)
set /a countFiles=countTrash=vergleiche=0
2>nul (for /f "delims==" %%i in ('set _^& set #') do @ set "%%i=")

 rem es werden nur Dateien gleicher Größe miteinander Verglichen
 rem gleiche Dateien (errorlevel = 0) werden markiert/verschoben/entfernt
 rem System Anwendbar auf (Vergleich/Erstellung) Backup 

set "sorted=%temp%\allfiles"
2>nul del "%sorted%" "%sorted%log"
echo please wait ... create filelist ...
 rem The files are sorted by size and timestamp.
robocopy /L . ". only test ..\\" /njh /ts /ns /nc /np /ndl /njs /bytes /log:"%sorted%log"
:: NOTICE: /ndL noDirectoryListing still outputs the file path without the /fp switch
echo             ... sort filelist ...

 rem The /R switch preserves the latest files and removes old duplicates.
 rem To keep the oldest files, the /R switch must be removed.
sort /R "%sorted%log" /o "%sorted%"
echo             ... filelist sorted.
echo   ... compare files ...

 rem create a new list/variable starting from this number of different files
 rem  this is necessary in order not to activate the overflow of the variable
 rem  for more different files, this value should be reduced -it can speed up the search
set "getinList=200"

:: rem These are just quick and dirty settings to speed up the search but NOT COMPLETE it
 rem tryLast == Compare only the most recently found unique files
 rem  set to 0 to compare all files OR set to 100 to compare only the last 100 files
set /a "tryLast=0"

 rem Only the last (now) list/variable is used for comparison.
 rem  leave blank to use all lists, or set to 1 to skip the old lists
set "splitList="

if %tryLast% equ 0 (set "tryLast=") else set "tryLast=|| ( 2>nul set/ai/=comp%%%tryLast% || set NOTtry=1)"
if defined splitList (set "n in (l) =%%n in (%%l)") else set "n in (l) =/l %%n in (%%l -1 0)"

:: which variable contains what?
 rem g  == listSorted
 rem h  == fileSize ( bytes )
 rem i  == fileTime
 rem j  == fullpathfileName (a b c d ab ad)
 rem #h == countSizeList 1 ...
 rem l  == #h
 rem #  == nextSizeList +1
 rem m  == #
 rem n  == actualSizeList
 rem old == sizeNow
set "old=0"
for %%g in ("%sorted%") do for /f "usebackQ tokens=1,3*" %%h in ("%%~g") do ( set "EX="
 set "comp="
 set "NOTtry="
 setlocal enabledelayedexpansion
 if !old! neq %%h call :clean :clean
 set /a "#%%h +=0, # =#%%h +1"
 for /f "usebackQtokens=1-2" %%l in ('!#%%h! !#!') do (
  for %n in (l) %do ( if :!! neq : setlocal enabledelayedexpansion
   for /f "usebackQtokens=1-2*" %%a in ('!countFiles! !countTrash! !_%%h#%%n!') do (
    endlocal
    if NOT defined NOTtry if NOT defined EX for %%d in (%%c) do if NOT defined NOTtry if NOT defined EX (
     set /a comp+=1
     >nul fc /lb1 "%%~nxj" "%%~d" && (
      echo  weg  %%j
      >nul move "%%~nxj" TRASH
      set /a countTrash+=1, EX=1
     ) %tryLast%
    )
    if %%n == %%l if NOT defined EX ( set /a "countfiles+=1"
     2>nul set/ai/=countFiles%%%getinList% || set "i="
     if NOT defined i ( set _%%h#%%m="%%~nxj"
      set "#%%h=%%m"
     )
     if defined i ( set _%%h#%%n="%%~nxj" %%c
      set "#%%h=%%l"
     )
    )
    title  Files  %%a  Trash  %%b
   )
  )
 )
 set "old=%%h"
)
title  Files %countFiles% trash %countTrash%  done
:clean
 rem here the lists for the different files are saved
2>nul md "%~dp0VersionLists"
for /f "delims=_#= tokens=1,2*" %%i in ('2^>nul set _') do >"%~dp0VersionLists\fc_%%i#%%j" echo %%k
if "%~1" == ":clean" exit /b 
pause
2>nul del "%sorted%" "%sorted%log"
exit /b
Originally this script should only sort out a small ( 200..) number of identical files.
But whatever...

Phil

Re: hELP

Posted: 03 Nov 2019 17:59
by sajjansinghania
At the very outset I must thank pieh-ejdsch Sir, for his invaluable help to me. You have solved my 2 year old problem. I thank God to have found DosTips.COm & you.
Sir, being a below average user I am unable to understand the suggestions you made.
Please can you adjust your problem in the contribution description? "Help" is not helpful ...
I'm not sure if this script meets your requirements.
If you want to compare such a large number of files of the same size, you will make much better progress with the file hash.
According to their thread from January, they wanted to identify newer versions of one / more files.
Therefore, I asked if they can use the file name to read out how to version the file.
Having tested this batch file I am fully happy about its use for me. If possible and not a problem, the first matched file is removed to Trash. I would prefer the subsequent file/files to be removed to trash.
Thank you Sir, Regards