QR Code contains TinyURL of this article.Parallel Processing from the Command Line

train tracks
Parallel TracksCredit: . License: CC BY 3.0

I’m one of those people who do a heck of a lot of their computing from the command line. I enjoy the OS X GUI and use a lot of GUI-based applications, but I’m more comfortable with a terminal window and feel more productive in that environment.

I do a lot of batch processing. Some of these batches consist of jobs that are processor-intensive (for example: imaging, data munging, encryption, compression/de-compression) while others consist of large series of less-expensive tasks.

In either case, one thing I don’t want is to be performing these tasks sequentially on a single core of my eight-core MacBook. No… what I really want is to distribute this work across all eight cores and even, where possible, across the countless cores of the other computers on my network.

Perchance I came across GNU Parallel. This awesome tool executes shell jobs concurrently — using one or more computers — and utilising one, more or all of the processor cores available to it.

This can increase the speed of batch processing by orders of magnitude. Using all eight cores of my MacBook instead of a single one suggests that a batch process could be 8× faster than it would otherwise be.1  Of course, in the real world it doesn’t work out this way. The operating system and other running tasks make their own demands on the resources of the computer — or cluster — and Parallel itself has some start-up overhead. Nevertheless, one can realise significant and measurable improvements in performance just by adding parallel to your batch pipeline.

I’ve spent the weekend adapting my collection of batch scripts to use GNU Parallel where applicable and I’ve seen some great speed improvements. Now I want to put together a little cluster so that I can experiment with processing across multiple machines. This stuff is fun!

  1. GNU Parallel;
  2. Man Page;
  3. GNU Parallel: The Command-Line Power Tool (PDF);
  4. Introductory Videos.
  1. I have to say, it’s rather cool seeing all the computer’s cores maxed-out in htop↩︎