I’m one of those people who do a heck of a lot of their computing from the command line. I enjoy the OS X GUI and use a lot of GUI-based applications, but I’m more comfortable with a terminal window and feel more productive in that environment.
I do a lot of batch processing. Some of these batches consist of jobs that are processor-intensive (for example: imaging, data munging, encryption, compression/de-compression) while others consist of large series of less-expensive tasks.
In either case, one thing I don’t want is to be performing these tasks sequentially on a single core of my eight-core MacBook. No… what I really want is to distribute this work across all eight cores and even, where possible, across the countless cores of the other computers on my network.
Perchance I came across GNU Parallel. This awesome tool executes shell jobs concurrently —
This can increase the speed of batch processing by orders of magnitude. Using all eight cores of my MacBook instead of a single one suggests that a batch process could be 8× faster than it would otherwise be.1 Of course, in the real world it doesn’t work out this way. The operating system and other running tasks make their own demands on the resources of the computer — parallel
to your batch pipeline.
I’ve spent the weekend adapting my collection of batch scripts to use GNU Parallel where applicable and I’ve seen some great speed improvements. Now I want to put together a little cluster so that I can experiment with processing across multiple machines. This stuff is fun!
Links
-
I have to say, it’s rather cool seeing all the computer’s cores maxed-out in
htop
. ↩︎