07 December 2007

Multithreading - Make Me Parallel!

Sometimes even being multithreaded isn't even enough. Obviously running multiple processes simultaneously is great and all but it depends how heavyweight the processes are as you could be spending a lot of calculation time in a single process not giving much more than a responsive GUI during calculation (not making the most of the computing power). What you really want to do is divide those processes into their smallest components and parallelise those.

One of the tools freely available to C++ developers to accomplish these types of tasks is Threading Building Blocks which was open-sourced this year by Intel. In fact, I am pretty sure I mentioned this before, but I am nearly thirty so my brain ain't what it used to be. There are actually two parts to the library, a thread-safe set of allocators and the multithreading library including algorithms and thread-safe containers.

It is designed with the STL in mind (I suppose the developers would hope for an eventual inclusion into the standard library). It seems to be designed at a higher level semantically than a lot of other threading libraries and could help people to try it out much faster.

The most obvious place to try out some of the parallelisation is the "parallel_for" loop. It's interesting that speed optimisation can be tried out with a "grain size" as it is a trade off between the overhead of performing parallel operations and making use of the processor(s). It all works in a similar fashion to the normal STL algorithms and containers with some additional classes like "ranges" for the begin and end of your divided up operation. In fact it all seems so strangely easy to do, if you have to do calculations and construct new data, stick them in one of the thread-safe containers and you will be laughing. Admittedly if you do use "normal loops" in your code, you need to STL-ise them, but that is a fairly trivial operation to turn them into functors.

All in all though it looks quite good, and easy to use if you are parallelising simple loops. I am sure some programs will find substantial speed-ups and blow up a few CPUs with over-use from that. I could tell you a story that Intel tech support said we shouldn't run a Pentium III CPU at 100% use consistently back in the day...