It’s Just Not Fair
Whilst debugging some pthread code recently I came across a somewhat unexpected and seemingly non-intuitive result regarding locks. Here is a simple pthread example program. The main creates a single child thread. This thead that works away incrementing the sGlobal counter until sContinue is cleared by main. The main simply checks the count and stops the child when the count exceeds 10. Of course, the accesses to sGlobal are protected by a mutex to ensure that a partially updated result isn’t read and there are no nasty data races (okay, so an int can be read atomically on pretty much any machine but sGlobal could be a more complex object).
static int sGlobal;
static int sContinue;
static pthread_mutex_t sMutex;
static void* childthread(void* p)
{
do
{
pthread_mutex_lock(&sMutex);
sGlobal++;
printf("Count = %d\n", sGlobal);
pthread_mutex_unlock(&sMutex);
}
while (sContinue);
}
int main(int argc, char *argv[])
{
pthread_t aThreadID;
int aCount;
pthread_mutex_init(&sMutex, NULL);
sContinue = 1;
sGlobal = 0;
pthread_create(&aThreadID, NULL, childthread, NULL);
do
{
pthread_mutex_lock(&sMutex);
aCount = sGlobal;
pthread_mutex_unlock(&sMutex);
}
while (aCount < 10);
sContinue = 0;
}
So what would you expect the output to be? Maybe a count 0 to 10 and then the program exits? I was expecting that main thread and the child thread to contend for the lock but ultimately for both of them to get pretty fair access to it. If the child thread has the lock but the main thread wants it I was expecting it to be next in line for acquiring it. Unfortunately there isn’t an orderly queue waiting for the lock, its a total free for all. As it turns out pthread mutexes provide absolutely no guarantee of fairness for lock accesses whatsoever. It doesn’t matter how long a thread has been waiting for a lock, another thread can push into the queue and get the lock again even after its just released it.
So what do you get when you run this. Well, it all depends. I ran this on a Linux Redhat 4 system with NPTL threads and the final count at termination were for successive runs was 641, 573, 229, 29, 25, 14791, 1708. Spot any pattern? No, neither can I. I also tried it on an ARM Debian Linux version with the older LinuxThreads implementation. Its still going. So this is rather random and very non-portable.
The child thread is hogging the lock. Even though it releases it, it grabs it again very quickly afterwards. Unless the process happens to get pre-empted just at the that point it will get the lock back again, even if the main thread has been waiting for ages. If most of the processing is done while the lock is held then there is very little opportunity for another thread to acquire the lock.
So why this rather rude free-for-all with locks? Its all in the name of efficiency. If orderely queues were maintained for each lock then every time a thread wants to acquire a lock it would need to be polite by checking if any other thread were waiting rather than just grabbing it. The queue itself would probably also need to be protected with locks since it could be accessed by any thread, and so it goes on. Its just simple and efficient to grab the lock if its available, no questions asked. Its up to the programmer to ensure fairness and synchronisation using other methods such as condition variables.
And what was the answer for ARM LinuxThreads? Well, its at 389000 and still counting. I suspect its never going to halt, but you never know (as Turing famously proved…).