How many threads ?
An aside that can up during a discussion on Thread Pools was how many threads should you have in your application ? This really has no straightforward answer other than “depends”.
Many would argue that on most systems you really should be working at a level of abstraction which deals with this sort of detail under to hood of the library or programming language however that is not much help to those of use stuck with PThreads or similar low level APIs. So lets look at what you should consider.
- Look at the number of cores. Now going for
num_threads = num_coresis not a good idea since any blocked threads will result in an idle core and your code could run into scalability issues as the number of cores is varied. However for applications targeting a well specified architecture this can allow tuning for very high performance. - Hardware Threads. Many cores support hardware threading where the core will switch between thread contexts very quickly to mask cache stalls. Tailoring to the number of threads to take advantage of this can yield great performance gains.
- How many ways can you actually partition your algorithm. This is the biggest question. If you go for a pipeline partitioning then you are going to quickly hit scaling problems. If you go for data partitioning you may create far more threads than you intended and run into problems as they content for resources.
- Memory hierarchy. How is the cache going to react to frequent thread swapping ? If each thread has a large working set you may spend all your time thrashing the cache. Not to mention the false sharing.
There are many more issues around this and impact is very much dependent on the target system. This just illustrates again how the move to multicore is forcing SW engineers to consider architecture and performance far earlier in the design cycle than before.
The best advice for developers of software that will be ported to many platforms can only be to ensure that they verify that their parallel implementation is as scalable as possible with clean interfaces between tasks for easy mapping onto whatever concurrency mechanism is in place on the target systems. That is: Make you code MultiCore ready before going to multicore.


