Page 1 of 2

Need help synchronizing files between servers.

Posted: 04 Jun 2015 13:14
by Matt Williamson
I don't even know where to start with this one. I have 2 servers that I need to synchronize files between but on one server the files are in folders and on the other server, they are not and there are A LOT of files. 924K on the first win server 2k3 server and 914k on the win 2012 r2 server second. I did the initial copy about a month ago and after the copy, I ran a script to pull dates and times out of the file names and organize the files into year/month folders. On the original server, they're all just in the same directory. The initial copy of the files took place over a weekend and I did it manually because it had to be within a certain window when no other processes were running. I have to come up with a way to be able to do this on the fly without using task scheduler and have the copy from the old server go to the new server in the same structure it was on the old server within a certain window of time (11pm - 5 am). I can then run my script to re-org them into the year/month format on the new server and decommission the old server. This is scheduled to happen on 6/14. Any suggestions would be greatly appreciated. Something simple like robocopy /mir obviously won't work for this.

-Matt

Re: Need help synchronizing files between servers.

Posted: 05 Jun 2015 02:51
by foxidrive
I wrote a simple script to move files from a mess into the same structure as the optimal server.
It relies on unique filenames to do the biz.

Time is your enemy here - and I don't quite follow the process.

You mentioned that you backed up a single folder, and organised it into folders on the backup system.

What do you want to do in the single folder now? It's going to have different files inside it I assume.
Do you want to organise the single folder of files on the primary system?

Re: Need help synchronizing files between servers.

Posted: 05 Jun 2015 07:18
by Matt Williamson
foxidrive wrote:What do you want to do in the single folder now? It's going to have different files inside it I assume.
Do you want to organise the single folder of files on the primary system?


Yes, the single folder will have more files in it now. I don't want to do anything in the single folder but copy all of the files that don't already exist on the other server. Just the files that have been added since I did the original copy. I was thinking of using a couple of for loops and comparing one against the other but with that many files, I'm sure it wouldn't be very efficient. Plus, there is that bug you mentioned before. Then there is the whole issue of scheduling it to run in the allotted time frame from 11pm-5am on weeknights without using scheduled tasks (that's a whole other issue but it would take weeks to get an exception to use scheduled task). Another idea is to just write out both directories in bare format to files and use fc to compare them and write a new file with the changes then feed that to another script to do the copy. I don't know what the best way to do this is though. It needs to be fast so I can run it on the fly at any given point. I'll probably run it a few times until the cut over so there is the least amount of files.

Re: Need help synchronizing files between servers.

Posted: 05 Jun 2015 19:21
by foxidrive
Doing a dir on both servers and comparing - not with fc - would work but the filenames would have to be unique,
or using the filename and date-time stamp. Unique filenames will make it faster to check for new files.

Re: Need help synchronizing files between servers.

Posted: 08 Jun 2015 08:20
by Matt Williamson
The file names are all unique. So, are we talking about something like this?

Code: Select all

@echo off
setlocal

set "old=z:\"
set "new=d:\inetpub\ftproot\XMLs"

for /f "delims=" %%a in ('dir /b z:\*.xml') do (
  for /f "delims=" %%b in ('dir /b/s %new%\*.xml') do (
     if not %%a EQU %%b echo copying %%a
   )
)


None of the files have spaces. They look like this:

BA1805030001_ACCTV21_20150507_LIQ_1_1068471958.xml
ID1820580001_FEDOUT_20150407_PAY_1_106739717F.xml
OC1821070001_ACCTV21_20150407_OPN_0_106137791C.xml

Re: Need help synchronizing files between servers.

Posted: 08 Jun 2015 10:05
by Aacini
If there are a lot of files you should not use "FOR /F ... IN ('DIR..." because it is very inefficient. Try this:

Code: Select all

@echo off
setlocal

set "old=z:\"
set "new=d:\inetpub\ftproot\XMLs"

for %%a in (z:\*.xml) do (
   if not exist "%new%\%%~NXa" copy "%%a" "%new%"
)

Antonio

Re: Need help synchronizing files between servers.

Posted: 08 Jun 2015 11:27
by Matt Williamson
Thanks Antonio.

I don't think this will work because the files on the new server are already in folders. Unless the for is recursive and will look into the folders. That's why I was using dir /b/s. If I'm missing something, please let me know.

Thanks!

-Matt

Re: Need help synchronizing files between servers.

Posted: 10 Jun 2015 11:18
by foxidrive
This is pretty sloppy with the inner loop running completely for each file - but only you will be able to tell how long it takes, by testing it.

With the directory scan only happening once and then using files to compare, it may be faster,
and the for /f slowdown issue won't be a problem.

Not using echo or screen output will make it run faster, for loads of small files

Code: Select all

@echo off
dir d:\ /b /s /a-d >"%temp%\dfiles.txt"
dir z:\ /b /s /a-d >"%temp%\zfiles.txt"

md "d:\newfiles" 2>nul

for /f "usebackq delims=" %%d in ("%temp%\dfiles.txt") do (
  set "no="
       for /f "usebackq delims=" %%z in ("%temp%\zfiles.txt") do (
         if /i "%%~nxd" == "%%~nxz" set no=1
       )
  if not defined no copy "%%z" "d:\newfiles" >nul
)



But using find/findstr may be faster for so many files, especially if the files are on a ramdrive.

Re: Need help synchronizing files between servers.

Posted: 10 Jun 2015 11:30
by foxidrive
Aacini wrote:If there are a lot of files you should not use "FOR /F ... IN ('DIR..." because it is very inefficient.

Antonio


Is that because of the for /f exponential slowdown, with more files/longer names, Antonio?

Re: Need help synchronizing files between servers.

Posted: 10 Jun 2015 11:49
by Aacini
Try this:

Code: Select all

@echo off
setlocal

set "new=d:\inetpub\ftproot\XMLs"

(for %%a in (z:\*.xml) do echo %%~NXa) > oldFiles.txt
(for /R "%new%" %%a in (*.xml) do echo %%~NXa) > newFiles.txt

findstr /V /G:newFiles.txt oldFiles.txt

Perhaps you may want to insert /I switch in findstr command, if the file names may differ in case.

@foxidrive,
Remember that FOR /F store the whole data created by the command before start processing it If there are many files with long names, the required space may be very large. I read in another post that this space is assigned in chuncks of a certain small size, and each time that the space is not enough the previous data is moved to a new larger space!

Antonio

Re: Need help synchronizing files between servers.

Posted: 11 Jun 2015 00:42
by foxidrive
Aacini wrote:Try this:

Code: Select all

findstr /V /G:newFiles.txt oldFiles.txt


Your strategy is going to be lightyears faster than my manual method

@foxidrive,
Remember that FOR /F store the whole data created by the command before start processing it
Antonio



I'm not sure if it applies to parsing a file, Antonio.

This code uses Dave's gettimestamp.bat to calculate elapsed time - do you see a flaw here?

Creating the file and parsing it seems to take less time than the for /r form, and the first test includes even more files if it includes hidden and system files.

Code: Select all

c:\>gett dir /b /s /a-d ^>"%temp%\a.txt"
0 days 00:00:17.924
Press any key to continue . . .

c:\>gett for /f "usebackq delims=" %a in ("%temp%\a.txt") do @rem
0 days 00:00:00.417
Press any key to continue . . .

c:\>gett for /r %a in (*) do @rem
0 days 00:00:26.022
Press any key to continue . . .



gett.bat

Code: Select all

@echo off
call getTimestamp -f {ums} -r t1
cmd /c %*
call getTimestamp -f {ums} -r t2
call getTimestamp -d %t2%-%t1% -f "{ud} days {hh}:{nn}:{ss}.{fff}" -u
pause

Re: Need help synchronizing files between servers.

Posted: 19 Jun 2015 05:10
by mcnd
Aacini wrote: I read in another post that this space is assigned in chuncks of a certain small size, and each time that the space is not enough the previous data is moved to a new larger space!


You are right, 4KB blocks (at least in windows 7). Each time the buffer is full, a new 4KB bigger buffer is allocated, old data copied, old buffer freed and all repeated on each overflow.

foxidrive wrote:Creating the file and parsing it seems to take less time than the for /r form, and the first test includes even more files if it includes hidden and system files.


The file is fully loaded in memory as you can see in the memory used graph (use a BIG file). In your test the file system cache has returned the file from memory buffers, the reason for the low load time.

EDITED - Reading it again I'm not sure I was clear. The problem with the memory allocation happens in `for /f` command when retrieving data from a command. When reading files a buffer big enough is created and data is readed directly into memory.

Re: Need help synchronizing files between servers.

Posted: 19 Jun 2015 07:56
by foxidrive
mcnd wrote:The file is fully loaded in memory as you can see in the memory used graph (use a BIG file). In your test the file system cache has returned the file from memory buffers, the reason for the low load time.


I considered caching at the time and did it in reverse sequence also. No change was apparent.
Have you tested this?

Or are you saying that file a.txt is buffered and that's why it is faster?

Reversing the sequence here and now - it made a huge difference.
The for /f test is only 5-and-a-bit seconds in total when using usebackq with a file.
The result was different when I originally tested. Screwy Windows and caching.

You'll notice in the first test from my last post that the for /r test didn't show such a huge difference, so where is the caching effect there?
Edit: I did disable the Superfetch service today - I wonder if that affects a cmd prompt command? I wouldn't have expected it to.

The a.txt file is indeed 23 MB and over 258 thousand lines.

Code: Select all

a.txt   23,827,537   20/06/2015 00:00   -a--



Code: Select all

c:\>gett for /r %a in (*) do @rem
0 days 00:00:17.024
Press any key to continue . . .

c:\>gett dir /b /s /a-d ^>"%temp%\a.txt"
0 days 00:00:04.976
Press any key to continue . . .

c:\>gett for /f "usebackq delims=" %a in ("%temp%\a.txt") do @rem
0 days 00:00:00.367
Press any key to continue . . .

Re: Need help synchronizing files between servers.

Posted: 19 Jun 2015 12:41
by mcnd
foxidrive wrote:Or are you saying that file a.txt is buffered and that's why it is faster?


Yes. Just after a restart, with idle machine, from a fresh cmd instance

Code: Select all

19/06/2015  20:14        51.985.984 data.txt

D:> gett for /f "delims=" %a in (data.txt) do @rem
0 days 00:00:11.344
Presione una tecla para continuar . . .

D:> gett for /f "delims=" %a in (data.txt) do @rem
0 days 00:00:02.156
Presione una tecla para continuar . . .



And replacing the @rem with a pause with the task manager memory chart open you can see the memory alocation and how it drops when Ctrl-C is pressed

For the SuperFetch service, as far as I know it is intended to deal with executables, not data files (but I am not really sure)

Re: Need help synchronizing files between servers.

Posted: 19 Jun 2015 22:54
by foxidrive
mcnd wrote:
foxidrive wrote:Or are you saying that file a.txt is buffered and that's why it is faster?


Yes. Just after a restart, with idle machine, from a fresh cmd instance

Code: Select all

19/06/2015  20:14        51.985.984 data.txt

D:> gett for /f "delims=" %a in (data.txt) do @rem
0 days 00:00:11.344
Presione una tecla para continuar . . .

D:> gett for /f "delims=" %a in (data.txt) do @rem
0 days 00:00:02.156
Presione una tecla para continuar . . .




I'm a bit confused about the direction of what we are nattering about.

I'm comparing the speed of for /r with for /f "usebackq delim=" with a file when parsing a lot of files, as the thread had as part of the issue.

Memory allocation is involved, but I'll run another test to see if I can nail this down in a practical sense rather than a theoretical one, without disk caching being involved.