I was recently troubleshooting some issues we were having with Shippable, trying to get a bunch of our unit tests to run in parallel so that our builds would complete faster. I didn’t care what order the different processes completed in, but I didn’t want the shell script to exit until all the spawned unit test processes had exited. I ultimately wasn’t able to satisfactorily solve the issue we were having, but I did learn more than I ever wanted to know about how to run processes in parallel in shell scripts. So here I shall impart unto you the knowledge I have gained. I hope someone else finds it useful!

Wait

The simplest way to achieve what I wanted was to use the wait command. You simply fork all of your processes with &, and then follow them with a wait command. Behold:

#!/bin/sh

/usr/bin/my-process-1 --args1 &
/usr/bin/my-process-2 --args2 &
/usr/bin/my-process-3 --args3 &

wait
echo all processes complete

It’s really as easy as that. When you run the script, all three processes will be forked in parallel, and the script will wait until all three have completed before exiting. Anything after the wait command will execute only after the three forked processes have exited.

Pros

Damn, son! It doesn’t get any simpler than that!

Cons

I don’t think there’s really any way to determine the exit codes of the processes you forked. That was a deal-breaker for my use case, since I needed to know if any of the tests failed and return an error code from the parent shell script if they did.

Another downside is that output from the processes will be all mish-mashed together, which makes it difficult to follow. In our situation, it was basically impossible to determine which unit tests had failed because they were all spewing their output at the same time.

GNU Parallel

There is a super nifty program called GNU Parallel that does exactly what I wanted. It works kind of like xargs in that you can give it a collection of arguments to pass to a single command which will all be run, only this will run them in parallel instead of in serial like xargs does (OR DOES IT??</foreshadowing>). It is super powerful, and all the different ways you can use it are beyond the scope of this article, but here’s a rough equivalent to the example script above:

#!/bin/sh

parallel /usr/bin/my-process-{} --args{} ::: 1 2 3
echo all processes complete

The official “10 seconds installation” method for the latest version of GNU Parallel (from the README) is as follows:

(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash

Pros

If any of the processes returns a non-zero exit code, parallel will return a non-zero exit code. This means you can use $? in your shell script to detect if any of the processes failed. Nice! GNU Parallel also (by default) collates the output of each process together, so you’ll see the complete output of each process as it completes instead of a mash-up of all the output combined together as it’s produced. Also nice!

I am such a damn fanboy I might even buy an official GNU Parallel mug and t-shirt. Actually I’ll probably save the money and get the new Star Wars Battlefront game when it comes out instead. But I did seriously consider the parallel schwag for a microsecond or so.

Cons

Literally none.

Xargs

So it turns out that our old friend xargs has supported parallel processing all along! Who knew? It’s like the nerdy chick in the movies who gets a makeover near the end and it turns out she’s even hotter than the stereotypical hot cheerleader chicks who were picking on her the whole time. Just pass it a -Pn argument and it will run your commands using up to n threads. Check out this mega-sexy equivalent to the above scripts:

#!/bin/sh

printf "1\n2\n3" | xargs -n1 -P3 -I{} /usr/bin/my-process-{} --args{}
echo all processes complete

Pros

xargs returns a non-zero exit code if any of the processes fails, so you can again use $? in your shell script to detect errors. The difference is it will return 123, unlike GNU Parallel which passes through the non-zero exit code of the process that failed (I’m not sure how parallel picks if more than one process fails, but I’d assume it’s either the first or last process to fail). Another pro is that xargs is most likely already installed on your preferred distribution of Linux.

Cons

I have read reports that the non-GNU version of xargs does not support parallel processing, so you may or may not be out of luck with this option if you’re on AIX or a BSD or something.

xargs also has the same problem as the wait solution where the output from your processes will be all mixed together.

Another con is that xargs is a little less flexible than parallel in how you specify the processes to run. You have to pipe your values into it, and if you use the -I argument for string-replacement then your values have to be separated by newlines (which is more annoying when running it ad-hoc). It’s still pretty nice, but nowhere near as flexible or powerful as parallel.

Also there’s no place to buy an xargs mug and t-shirt. Lame!

And The Winner Is…

After determining that the Shippable problem we were having was completely unrelated to the parallel scripting method I was using, I ended up sticking with parallel for my unit tests. Even though it meant one more dependency on our build machine, the ease of ad-hoc command line argument passing and collated output is just too nice to pass up.