Algorithms in the real world
With the release of Prism, we provide a recommended flow for migrating sequential code to parallel. One of the first things we suggest is to understand the intent and code structure of your application and make sure it’s well optimized.
No matter how well optimized though, when you go to parallelize, there are typically some fundamental data dependencies which will serialize parts of your application. The standard advice is to consider switching algorithms to one that is more parallel friendly. It sounds good, but many applications I’ve worked on, for example, h.264 video decode, have algorithms prescribed by standard.
I recently took a look at parallelizing the Dijkstra algorithm as part of a network routing application.

The algorithm works to identify shortest paths in a network by following edges and keeping track of the minimum cost to get to each node. When starting from a source node, the algorithm maintains a priority queue of active nodes, choosing the minimum distance node as the next node to start form as the algorithm traverses the edges. This gives a worst case performance of:
O(|E| + |V|log|V|)
where |E| is the number of edges, and |V| is the number of vertices (nodes). The |E| part is easy to understand; all the reachable edges should be considered. The |V|log|V| part is trickier- that one depends on the priority queue implementation. Each time a node is added, the queue must be reprioritized. A binary heap implementation requires O(log|V|) swaps, and there are |V| nodes. That explains it.
But, I’m free to choose other implementations- a simple list would work but require O(|V|) searches to find the minimum distance node each time. A Fibonacci heap priority queue might even have better performance, statistically speaking, than a binary heap. That statistical part finally made me wake up. I realized that when thinking about changing algorithms, I’d always been thinking about going to something that was considered universally better, like the difference between a quicksort and a bubble sort. In practice, for Dijkstra, it appears that the Fibonacci queue often performs worse than the binary heap.
The best choice algorithm depends on the properties of the graph being analyzed. The real world can be so messy, but that makes room for good engineering and job security!
So imagine my surprise when I picked up the serial Dijkstra benchmark code inside MiBench and discovered it didn’t even bother to reprioritize the nodes in the queue, it just used the oldest nodes first in a simple queue. With a little thinking, you realize that if you reprioritize to always select the minimum distance node first, you are guaranteed not to have to revisit previous nodes. You lose that benefit if you just take the oldest node first, so the worst case performance appears to me to be really bad:
O(|E||V|)
because at each node you might have to backtrack over previously traversed edges. That sounds wasteful, but the real world may rescue us. If edges all have similar costs, then the algorithm will add nodes with the fewest node hops first, and that will greatly reduce the number of backtracked edges needed. With no reprioritization and no backtracking, the best case performance for this simpler implementation would become:
O(|E| + |V|)
Real world results will vary, but for typical routing graphs, I bet they are closer to O(|E| + |V|).
So I have a new found appreciation for ‘changing algorithms’. Will it help my parallelization? I’ll let you know.
- Skip