Start and wait for parallel batch jobs
Moderator: DosItHelp
Start and wait for parallel batch jobs
I'd like to use a batch job to start multiple other batch jobs that will run in parallel and wait for all of them to finish. In each case I know how many are parallel but each set of parallel jobs may have a different number of parallel jobs - it is a basic job-controller for our dev environment.
I have found that I can ALMOST make it work by using WAITFOR, but it seems a bit sensitive to events happening too close together, as well as leaving "timing holes".
I've tried (simplified some):
In MainJob.bat:
--------------
SET Signal=ImDone
START "Batch A" A.bat %Signal%
START "Batch B" B.bat %Signal%
START "Batch C" C.bat %Signal%
START "Batch D" D.bat %Signal%
START "Batch E" E.bat %Signal%
REM Same Number of Waits as I have parallel jobs
WAITFOR /t 60 %Signal%
WAITFOR /t 60 %Signal%
WAITFOR /t 60 %Signal%
WAITFOR /t 60 %Signal%
WAITFOR /t 60 %Signal%
REM Next set of jobs... Etc.
--------------
... and In the A-E.bat:
--------------
REM Do Something
waitfor /s 127.0.0.1 /si %1
EXIT
--------------
If I add different timeouts or pause statements to space out the signal sends, it seems to work perfectly.
But since the real jobs may have varying durations to start with, I can't predict how close together they complete.
Is there a better way to do this type of synchronization? I'm trying to stay within the batch environment because we have built this out over time and use batch (temporary environment) variables through much of our processes. We are now taking it from a strictly sequential job to one with parallel jobs and sequences.
I have found that I can ALMOST make it work by using WAITFOR, but it seems a bit sensitive to events happening too close together, as well as leaving "timing holes".
I've tried (simplified some):
In MainJob.bat:
--------------
SET Signal=ImDone
START "Batch A" A.bat %Signal%
START "Batch B" B.bat %Signal%
START "Batch C" C.bat %Signal%
START "Batch D" D.bat %Signal%
START "Batch E" E.bat %Signal%
REM Same Number of Waits as I have parallel jobs
WAITFOR /t 60 %Signal%
WAITFOR /t 60 %Signal%
WAITFOR /t 60 %Signal%
WAITFOR /t 60 %Signal%
WAITFOR /t 60 %Signal%
REM Next set of jobs... Etc.
--------------
... and In the A-E.bat:
--------------
REM Do Something
waitfor /s 127.0.0.1 /si %1
EXIT
--------------
If I add different timeouts or pause statements to space out the signal sends, it seems to work perfectly.
But since the real jobs may have varying durations to start with, I can't predict how close together they complete.
Is there a better way to do this type of synchronization? I'm trying to stay within the batch environment because we have built this out over time and use batch (temporary environment) variables through much of our processes. We are now taking it from a strictly sequential job to one with parallel jobs and sequences.
Re: Start and wait for parallel batch jobs
Take a look at my answer to Parallel execution of shell processes over at StackOverflow. It shows one reliable way to launch parallel jobs and wait for them to finish. It has a mechanism to specify a long list of jobs, yet limit the number of parallel processes.
Dave Benham
Dave Benham
Re: Start and wait for parallel batch jobs
You may use a very simple synchronization trick that may be enough for your needs. The method consist in create a different file as a process identification before each parallel job start, and delete the file just before the job ends. This way, the main program just must wait for all pid files to disappear.
In MainJob.bat:
... and In the A-E.bat:
See this post.
Antonio
In MainJob.bat:
Code: Select all
del *.pid
for %%a in (A B C D E) do (
echo Anything > %%a.pid
START "Batch %%a" %%a.bat %%a
)
REM Wait for all processes to end
:wait
rem You may insert a ping delay here
if exist *.pid goto wait
... and In the A-E.bat:
Code: Select all
REM Do Something
del %1.pid
EXIT
See this post.
Antonio
Re: Start and wait for parallel batch jobs
I prefer the built in feature, as this suffices for most tasks i've seen (Déjà-vu?):
penpen
Code: Select all
A.bat | B.bat | C.bat | D.bat | E.bat
rem you may also use the following if you wnat to see status messages of all processes.
rem A.bat>con | B.bat>con | C.bat>con | D.bat>con | E.bat
penpen
Re: Start and wait for parallel batch jobs
Thanks!
It looks like I have 4 workable methods to use:
Dave
- I like the fact that your solution can launch from a list of tasks - I may eventually get to the point of having a declarative batch setup. However, I have predefined parallel tracks, so some of the logic would need to be looked at for my need. There's not enough good information out there on how to implement parallel batch synchronization. Thanks for posting!
Antonio
- I like the simplicity of your solution. Why didn't I think of that? I guess I was trying to avoid using the file system. But the simplest solution often has the least opportunity for failure!
penpen
- I'm not sure I understand the solution - I'm not a batch expert, and would have thought it launches A.bat, pipes the output from it to B.bat, etc., resulting in serialized execution of the jobs. But if this really launches all 5 in parallel (i.e. separate processes/threads) and the main process/thread only continues after they all finish - that would be exactly what I'd want!!!
My own using WAITFOR
- I wound up implementing a two-way handshake that seems to work - not because I didn't want to use any of the solutions above, but because I forgot to mark my post with "notify of follow-ups" and in my impatience I kept working on my initial approach and I didn't see the replies until today. I'm not sure I can declare my approach completely safe or production quality (fortunately, I don't have to worry about that), but it has worked perfectly in testing my PoC and for several dev process runs at this time. Below is what I'm doing currently. I use two sets of signals - a "private" (numbered) signal for each batch job running in parallel that is issued from the main batch job to instruct the parallel job that it can send its completion signal, and a shared completion signal that is sent from the parallel job(s) back to the main batch job to notify it of completion. The main batch job loops, issuing the "private" "ReadyToReceiveDoneSignal" to one parallel job at a time, waiting for the "SharedDoneSignal" long enough to allow the parallel job to respond. It then logs/gathers all the responses, sorts them, and compares the list of received responses to the list of all expected responses. I make extensive use of parameter passing to batch (environment) variables - so there are a few places I leverage this. The one thing I'm missing at this time, is to be able to return values back to the main batch job. But that is another issue.
I also log to a file (used to log the overall progression of my job). It didn't wind up being a simple approach, and it does add a few seconds to each wait (but effectively only one or maybe two wait periods per section that launches parallel jobs) which makes it unsuitable for fast-running (subsecond) batches. In my case, I'm talking minutes for most of the tasks, and a total of 1-2 hrs for the whole job.
I made a few hand-made adjustments in this post - hope I didn't break anything. The %ChooseEnv% is simply a way to allow multiple developers to use the same batch and logic, and to separate some things, such as logging.
I do like two things about my approach (I'm not partial of course ) - AFAIK it doesn't require any additional permissions (other than for logging), and it could potentially be modified to synchronize jobs on multiple computers. But it is more code than I had wanted to use in the first place.
In the parallel jobs:
In the main batch job:
It looks like I have 4 workable methods to use:
Dave
- I like the fact that your solution can launch from a list of tasks - I may eventually get to the point of having a declarative batch setup. However, I have predefined parallel tracks, so some of the logic would need to be looked at for my need. There's not enough good information out there on how to implement parallel batch synchronization. Thanks for posting!
Antonio
- I like the simplicity of your solution. Why didn't I think of that? I guess I was trying to avoid using the file system. But the simplest solution often has the least opportunity for failure!
penpen
- I'm not sure I understand the solution - I'm not a batch expert, and would have thought it launches A.bat, pipes the output from it to B.bat, etc., resulting in serialized execution of the jobs. But if this really launches all 5 in parallel (i.e. separate processes/threads) and the main process/thread only continues after they all finish - that would be exactly what I'd want!!!
My own using WAITFOR
- I wound up implementing a two-way handshake that seems to work - not because I didn't want to use any of the solutions above, but because I forgot to mark my post with "notify of follow-ups" and in my impatience I kept working on my initial approach and I didn't see the replies until today. I'm not sure I can declare my approach completely safe or production quality (fortunately, I don't have to worry about that), but it has worked perfectly in testing my PoC and for several dev process runs at this time. Below is what I'm doing currently. I use two sets of signals - a "private" (numbered) signal for each batch job running in parallel that is issued from the main batch job to instruct the parallel job that it can send its completion signal, and a shared completion signal that is sent from the parallel job(s) back to the main batch job to notify it of completion. The main batch job loops, issuing the "private" "ReadyToReceiveDoneSignal" to one parallel job at a time, waiting for the "SharedDoneSignal" long enough to allow the parallel job to respond. It then logs/gathers all the responses, sorts them, and compares the list of received responses to the list of all expected responses. I make extensive use of parameter passing to batch (environment) variables - so there are a few places I leverage this. The one thing I'm missing at this time, is to be able to return values back to the main batch job. But that is another issue.
I also log to a file (used to log the overall progression of my job). It didn't wind up being a simple approach, and it does add a few seconds to each wait (but effectively only one or maybe two wait periods per section that launches parallel jobs) which makes it unsuitable for fast-running (subsecond) batches. In my case, I'm talking minutes for most of the tasks, and a total of 1-2 hrs for the whole job.
I made a few hand-made adjustments in this post - hope I didn't break anything. The %ChooseEnv% is simply a way to allow multiple developers to use the same batch and logic, and to separate some things, such as logging.
I do like two things about my approach (I'm not partial of course ) - AFAIK it doesn't require any additional permissions (other than for logging), and it could potentially be modified to synchronize jobs on multiple computers. But it is more code than I had wanted to use in the first place.
In the parallel jobs:
Code: Select all
...
REM after processing and error handling, before exiting, wait for "clear to send" and then send "completed"
IF "%Signal%"=="" (
ECHO Returning from %~n0 [sequential processing]
) ELSE (
ECHO Finished %~n0 [parallel processing] %Signal%
CALL :SignalDone %1 %2
EXIT
)
GOTO :eof
:SignalDone
WaitFor /t 120 %2
IF %ERRORLEVEL%==0 (
ECHO %~n0 Received Signal %2
) ELSE (
ECHO %~n0 Timed Out waiting for %2
GOTO :SignalDone
)
Timeout 1
ECHO %~n0 Sending %1 signal and exiting
waitfor /s 127.0.0.1 /si %1
GOTO :eof
In the main batch job:
Code: Select all
...
REM Initialize common part of Signal Names
SET ReadySignal=Ready
REM This is the expected "Completed" signal from each parallel task
SET SharedDoneSignal=Done
...
REM Initialize parallel job counter
SET SignNum=0
...
REM Sequential processing section ...
...
REM Parallel section:
REM First parallel job #
SET StartSigNum=%SigNum%
CALL :LaunchParallelBatNoExt A.bat
CALL :LaunchParallelBatNoExt B.bat
CALL :LaunchParallelBatNoExt C.bat
CALL :WaitForParallels %StartSigNum%
...
REM More processing - sequential and parallel as needed...
...
REM End of processing
GOTO :eof
:LaunchParallelBatNoExt
REM Clear ERRORLEVEL:
cmd /c "exit /b 0"
REM The parallel task will wait to be informed it can send the "completed" message
SET /A SigNum+=1
SET ReadyToReceiveDoneSignal=%ReadySignal%%SigNum%
REM This is the expected "Completed" signal from each parallel task
ECHO START "%1" %1.bat %SharedDoneSignal% %ReadyToReceiveDoneSignal% ^> %ChooseEnv%_%1.out" >> %CD%/%ChooseEnv%_LogRunSteps.log
START "%1" %1.bat %SharedDoneSignal% %ReadyToReceiveDoneSignal% > %ChooseEnv%_%1.out
IF NOT %ERRORLEVEL%==0 (
ECHO %TIME% - #### ERROR IN #### Launching Constructed Batch %1.bat >> %CD%/%ChooseEnv%_LogRunSteps.log
) ELSE (
ECHO %TIME% - ++++ SUCCESS ++++ Launched Constructed Batch %1.bat >> %CD%/%ChooseEnv%_LogRunSteps.log
)
GOTO :eof
:WaitForParallels
cmd /c "exit /b 0"
REM Starting Signal Number was last one used in prior set - or 0. It is not a Signal Number used in this parallel set.
SET Loop=%1
SET LastSigNum=%SigNum%
SET SignalsReceived=x0x
Set CountReceived=0
REM 0 is not a valid signal entry - just used to initialize variable...
:WaitLoop
SET /A Loop+=1
ECHO %SignalsReceived% | FINDSTR "x%Loop%x" > nul
IF %ERRORLEVEL%==1 (
Waitfor /s 127.0.0.1 /si %ReadySignal%%Loop%
REM Reset ErrorLevel:
cmd /c "exit /b 0"
REM Allow 5 seconds for round trip response...
WAITFOR /T 5 %SharedDoneSignal%
IF !ERRORLEVEL!==0 (
Set /A CountReceived+=1
SET SignalsReceived=%SignalsReceived%%Loop%x
) ELSE (
ECHO X%Loop% Did not Signal
)
) ELSE (
ECHO %ReadySignal%%Loop% was previously received
)
REM ECHO %SignalsReceived% Last Checked Signal %Loop% out of %LastSigNum%, CountReceived = %CountReceived% or !CountReceived!
IF %Loop% LSS %LastSigNum% GOTO WaitLoop
SET Loop=%1
SET OrderedSignalsReceived=x0x
SET AllSignals=x0x
:SortSignalsReceived
ECHO %SignalsReceived% | FINDSTR "x%Loop%x" > nul
SET /A Loop+=1
SET AllSignals=%AllSignals%%Loop%x
REM Reset ErrorLevel:
cmd /c "exit /b 0"
ECHO %SignalsReceived% | FINDSTR "x%Loop%x" > nul
IF %ERRORLEVEL%==0 SET OrderedSignalsReceived=%OrderedSignalsReceived%%Loop%x
REM ECHO Sorted Signals %OrderedSignalsReceived% Last Checked Signal %Loop%
IF %Loop% LSS %LastSigNum% GOTO SortSignalsReceived
ECHO Comparing %OrderedSignalsReceived%==%AllSignals%
IF %OrderedSignalsReceived%==%AllSignals% GOTO DoneWaiting
SET Loop=0
REM ECHO Next Iteration
GOTO WaitLoop
:DoneWaiting
ECHO All Signals received...
GOTO :eof
Re: Start and wait for parallel batch jobs
Nearly, if you use "A.bat | B.bat", and if A's piped output is not consumed by B.bat, then the pipe could run full.TBQ wrote:penpen
- I'm not sure I understand the solution - I'm not a batch expert, and would have thought it launches A.bat, pipes the output from it to B.bat, etc., resulting in serialized execution of the jobs. But if this really launches all 5 in parallel (i.e. separate processes/threads) and the main process/thread only continues after they all finish - that would be exactly what I'd want!!!
Then "A.bat" is blocked until "B.bat" is terminated: In this case it is serialized execution of jobs (after pipe is full).
But if you use for example the second option (>con), then the pipe stays always empty, and the process work in parallel.
This version has some unwanted effects (for example, \n\r is written to screen as replacement characters, or it cannot be redirected again).
So i've created this sample batch file (test.bat) to demonstrate logging to different batch files, or redirecting output to stderr, so it could be redirected again:
Code: Select all
@echo off
setlocal
set "n=0"
goto :loadProcess%~1
echo The label "%~1" is not defined.
exit/B 1
:loadProcess
echo Started process: main
(>con "%~f0" 1) | (>con "%~f0" 2) | (>con "%~f0" 3) | (>con "%~f0" 4)
echo End of process: main
exit /B 0
:loadProcessLog
(
echo Started process: main
(>"test1.txt" "%~f0" 1) | (>"test2.txt" "%~f0" 2) | (>"test3.txt" "%~f0" 3) | (>"test4.txt" "%~f0" 4)
echo End of process: main
) > "test.txt"
exit /B 0
:loadProcessSTDERR
(
echo Started process: main
(>&2 "%~f0" 1) | (>&2 "%~f0" 2) | (>&2 "%~f0" 3) | (>&2 "%~f0" 4)
echo End of process: main
) >&2
exit /B 0
:loadProcess1
:loadProcess2
:loadProcess3
:loadProcess4
echo Started process: process%~1
goto :process%~1
echo The label "%~1" is not defined.
exit/B 1
:process1
ping 127.0.0.1 >nul
echo ^(%~1, %n%, 1^)
set /A "n+=1"
:process2
for %%a in (1, 1, 0xFFFF) do set "dummy=%%a"
echo ^(%~1, %n%, 2^)
set /A "n+=1"
:process3
echo ^(%~1, %n%, 3^)
set /A "n+=1"
:process4
echo ^(%~1, %n%, 4^)
set /A "n+=1"
if %n% LSS 20 goto :process1
echo End of process: process%~1
exit/B 0
Then start it using these command lines:
Code: Select all
test
test Log
test STDERR
test STDERR 2> STDERR.txt
The last command line behaves similar, but it only creates "STDERR.txt".
penpen
Re: Start and wait for parallel batch jobs
penpen - excellent!
Simplified to my situation where one batch file launches others - and to avoid the garbled output, this seems to work wonders:
Main batch file invokes three other batch files in parallel with three parameters specifying the number of seconds for three wait ("timeout") periods in the called batch files (this is only to show the parallelism). I did find that I couldn't use TIMEOUT in the launched batch files - I guess because it consumes stdin. I will probably redirect the error output as well so that it is more obvious how the errors pertain to the processing that took place. For now - it helps illustrate the parallel nature of the processing.
Each of the three P*.bat files (I could have used a single file, I know, but this is how I am testing...):
Results of the run shows clearly the parallelism and keeps the output tidy:
Simplified to my situation where one batch file launches others - and to avoid the garbled output, this seems to work wonders:
Main batch file invokes three other batch files in parallel with three parameters specifying the number of seconds for three wait ("timeout") periods in the called batch files (this is only to show the parallelism). I did find that I couldn't use TIMEOUT in the launched batch files - I guess because it consumes stdin. I will probably redirect the error output as well so that it is more obvious how the errors pertain to the processing that took place. For now - it helps illustrate the parallel nature of the processing.
Code: Select all
@ECHO Off
echo Started process: main
(P1.bat 3 15 7 > P1.out ) | (P2.bat 7 3 5 > P2.out ) | (P3.bat 5 3 3 > P3.out )
echo End of process: main
TYPE P*.out
pause
Each of the three P*.bat files (I could have used a single file, I know, but this is how I am testing...):
Code: Select all
@ECHO Off
ECHO %TIME% - Starting In %~n0
REM Timeout %1
WAITFOR /t %1 Timeout%~n0
ECHO %TIME% - Second Output In %~n0
REM Timeout %2
WAITFOR /t %2 Timeout%~n0
ECHO %TIME% - Third Output In %~n0
REM Timeout %3
WAITFOR /t %3 Timeout%~n0
ECHO %TIME% - Completing In %~n0
Results of the run shows clearly the parallelism and keeps the output tidy:
Code: Select all
Started process: main
ERROR: Timed out waiting for 'TimeoutP1'.
ERROR: Timed out waiting for 'TimeoutP3'.
ERROR: Timed out waiting for 'TimeoutP2'.
ERROR: Timed out waiting for 'TimeoutP3'.
ERROR: Timed out waiting for 'TimeoutP2'.
ERROR: Timed out waiting for 'TimeoutP3'.
ERROR: Timed out waiting for 'TimeoutP2'.
ERROR: Timed out waiting for 'TimeoutP1'.
ERROR: Timed out waiting for 'TimeoutP1'.
End of process: main
P1.out
18:04:26.64 - Starting In P1
18:04:29.67 - Second Output In P1
18:04:44.69 - Third Output In P1
18:04:51.71 - Completing In P1
P2.out
18:04:26.64 - Starting In P2
18:04:33.68 - Second Output In P2
18:04:36.71 - Third Output In P2
18:04:41.73 - Completing In P2
P3.out
18:04:26.64 - Starting In P3
18:04:31.67 - Second Output In P3
18:04:34.69 - Third Output In P3
18:04:37.72 - Completing In P3
Press any key to continue . . .
Re: Start and wait for parallel batch jobs
As i'm using win xp home 32 bit (i will probably update to win 7/8 soon), i'm not familar with the "timeout" application.
The "pipe"-solution is just a simple solution and has its drawbacks, so if you need (different) STDIN (as it seems is needed for example by timeout: nice hidden use of STDIN - i like it -> something new to test),
then you have to use one of the other solutions.
Note that the input could be worked around in a similar way as the output:
(I never needed to do that for more than one input, and have no idea of side effects.)
penpen
The "pipe"-solution is just a simple solution and has its drawbacks, so if you need (different) STDIN (as it seems is needed for example by timeout: nice hidden use of STDIN - i like it -> something new to test),
then you have to use one of the other solutions.
Note that the input could be worked around in a similar way as the output:
But it has probably its disadvantages, too.echo abc | (<con (set "input=" & set /P "input=process B needs input: " & set input))
(I never needed to do that for more than one input, and have no idea of side effects.)
penpen
Re: Start and wait for parallel batch jobs
penpen - thanks for all your help!
I don't expect I'll need to worry about the input - except any debugging "PAUSE"s will probably fail. I can probably live with that if I can't get you technique below to work.
The TIMEOUT utility is part of the XP Resource kit (if that's still available). It works like a PAUSE with a time limit.
Anyway - I'll probably get back to this in the next go-around. 'Till then - keep up the good work!
I don't expect I'll need to worry about the input - except any debugging "PAUSE"s will probably fail. I can probably live with that if I can't get you technique below to work.
The TIMEOUT utility is part of the XP Resource kit (if that's still available). It works like a PAUSE with a time limit.
Anyway - I'll probably get back to this in the next go-around. 'Till then - keep up the good work!
Re: Start and wait for parallel batch jobs
penpen wrote:I prefer the built in feature, as this suffices for most tasks i've seen (Déjà-vu?):Code: Select all
A.bat | B.bat | C.bat | D.bat | E.bat
rem you may also use the following if you wnat to see status messages of all processes.
rem A.bat>con | B.bat>con | C.bat>con | D.bat>con | E.bat
penpen
Linked to your answer on this SO question.