Archive

Archive for January, 2009

Thanks for the Memory

January 23rd, 2009

When retro-fitting a sequential application with threads, there are many potential problems however its always the side effects that seem to get you with unexpected bugs. This situation becomes worse when you are attempting to crowbar your application into the resource constrained environment of an embedded system.

Threads, here I’m talking about PThreads but it is true for most libraries, have overhead. There is design overhead, code , maintenance and runtime overheads and of course memory overhead which is the one you really have to lookout for.

PThreads are basically lightweight processes and so have to carry around a good deal of state with them and chief among these is the stack. In a typical implementation, the stack for each thread is allocated on the heap when the thread is created and is a fixed size. This sounds straightforward enough but can immediately cause several failure modes.

  • The per thread stack size X the number of threads is greater than the available heap space resulting in thread creation failure.
  • There is enough space on the heap for all of thread overhead but the next malloc call fails because there is no more room.
  • The per thread stack is too small for the amount of data that needs to be pushed onto it, leading to any number of obscure errors.

So what can you do about this? Fortunately PThreads provides a solution via its thread attributes API. If you are not familiar with attributes it is simply a structure for configuring a thread which is passed to pthread_create as the second parameter which is otherwise set to NULL, as below.


pthread_attr_t myAttr;
pthread_attr_init(&myAttr);
pthread_create(&aThread, &myAttr, (void *) myFunc, (void *) myParams);

Once the pthread_attr_t variable is initialized you can use it to specify the size of the stack you require (among other things). However it is not always that straightforward (is anything) and you will need to test if your PThreads library actually supports this variable stack size and then you need to test what the minimum allowable stack size is before you set it. This is handled via declarations for your system.


pthread_attr_t myAttr;
pthread_attr_init(&myAttr);
size_t myStackSize;
#ifdef _POSIX_THREAD_ATTR_STACKSIZE
pthread_attr_getstacksize(&myAttr, &myStackSize);
printf(“Default Stack is %d Bytes.\n”, myStackSize);
myStackSize = 16*1024; “// Try to set stack to 16K”
if (myStackSize >= PTHREAD_STACK_MIN)
pthread_attr_setstacksize(&myAttr, myStackSize);
else
printf(“PANIC! New Stack size too small!.\n”);
#else
printf(“PANIC! Cannot set stacksize.\n”);
#endif

So there you go. Each thread can now have a custom stack size tailored to its own needs, minimizing the total thread overhead of the system. Of course the problems don’t end there. Many modifications when parallelizing serial code, particularly when attempting to improve performance by adding thread local buffers, will further increase the memory footprint of your application and unexpected combinations of threads may result in memory usage spikes. So on behalf of future code maintainers everywhere please, please check the return value of malloc() and friends for failures!

Barry

Multicore Mobile Mainstream

January 12th, 2009

Believe it or not, there were a few rumors at this month’s Macworld Expo that didn’t speculate about Steve Jobs. One of the most interesting was posted by Jason O’Grady at ZDNet.  Jason reported rumors about a mysterious quad core iPhone and corresponding firmware 3.0 which would include support for the quad core. If true, this would be a significant milestone for multicore use on mainstream consumer platforms.

Of course, there are already embedded multicore devices in production today. Even though they’re in the market, a significant percentage of these devices are still running single core software. Though this allows independent applications to run in parallel, it doesn’t lower each application’s power profile or, conversely, allow each application greater throughput. To achieve these benefits, application software must be multicore ready.

In these challenging economic times, new platforms must disrupt markets. There is little profit in launching a quad core phone without stunning benefits to the end consumer, and that means applications must take full advantage of the multicore architecture. Likewise, getting the most out of existing multicore platforms can extend their product lifetime revenues, and software enhancements are the way to do it. Either way, the time to make applications multicore friendly is now.

For multithreaded platforms programmed in C, Pthreads is the natural choice. The Pthreads’ API is available on many operating systems, and the Pthreads’ library is especially well supported on Linux variants. Applications can be parallelized and debugged using pthreads, and they will still run correctly in a single core environment. An unexpected benefit of threading serial code is that it’s likely that some previously undiscovered bugs will be flushed out during the refactoring.

In some cases, OpenMP may be an attractive initial approach. OpenMP uses a set of compiler directives to describe parallelism available in the serial code. Maximum parallel performance may not be achieved in all cases. Offsetting that, the code can always be run serially by simply disabling the directives during compilation.

Even if not immediately releasing multithreaded applications, ensuring that libraries and drivers are multithread safe provides significant leverage going forward. When you do discover an application requiring the extra performance or lower power of multicore, having multicore-ready libraries and drivers will enable you to respond quickly with less learning curve and better debug isloation.

Signs are that 2009 is the year for multicore to impact the mainstream embedded market. Making existing software multicore ready will pay dividends to developers willing to make the investment now, even when times are tough.

Skip