Malcolm Groves — “But what happens if…” : The Joy of Race Conditions

In my last post, I was trying to highlight the fact that just because you have done a WaitForAny and one of your Tasks has ended, the others don’t just all magically stop somehow. It’s good form for your Task to be checking if it has been Cancelled, and to quit as soon as possible, certainly before making any changes outside the context of the Task.

However, as I mentioned at the end of that post, we’re still not done. There is a window of time, albeit a small one, where things could go wrong, and in this post I want to explore what the problem is and how to resolve it. The main point though, is to encourage you to get in the habit of thinking about these types of issues. To have little conversations with yourself, that usually start with “OK, but what happens if…” and go on to outline the perfect storm of bad timing that could lead to unexpected problems.

Let’s have a look at the code we ended with last time:

procedure TFormThreading.Button2Click(Sender: TObject);
var
  tasks: array of ITask;
  value: Integer;
  LTask: ITask;
begin
  Setlength (tasks ,2);
  value := 0;

  tasks[0] := TTask.Create(procedure
                           begin
                             sleep (3000);
                             if tasks[0].Status <> TTaskStatus.Canceled then
                               TInterlocked.Add(value, 3000);
                           end);
  tasks[0].Start;

  tasks[1] := TTask.Create(procedure
                           begin
                             sleep (5000);
                             if tasks[1].Status <> TTaskStatus.Canceled then
                               TInterlocked.Add (value, 5000);
                           end);
  tasks[1].Start;

  TTask.WaitForAny(tasks);
  for LTask in tasks do
    LTask.Cancel;
  ShowMessage('All done: ' + value.ToString);
end;

The conversation in my head went something like this:

“What if WaitForAny returns, but before I can signal the second Task to cancel, it finishes sleeping and updates Value”?

Small chance, you might say, and possibly you’re right, but it could happen, and depending on the application, might not be a desirable result.

There are a number of options for controlling access to shared items. Check out the SyncObjs unit, also the various Monitor* methods in the System unit, such as MonitorEnter.

But let’s step back for a minute and think about what we’re actually trying to do.

What we want is a way to say “Only update Value if no other tasks have already updated it”.

That’s a little different to what I was discussing in the last post. There I was saying “Don’t update Value if I’ve been cancelled”. If you think about it, I was kinda squishing two requirements into one there. One requirement was to exit the task quickly if I’ve been cancelled, the other was not to update Value if another task already had. By trying to combine them, I ended up with a race condition.

So the first step is to separate them:

As mentioned last post, regularly check if you have been cancelled during your Task and stop what you’re doing.
Separately, let’s check if another Task has already updated Value before we do it. Further, let’s do it in an atomic way, so that another Task can’t jump in between the check and the update. I’m setting Value to zero before starting either of the tasks, so if it’s non-zero, I know one of the other tasks has finished.

The solution is actually staring me in the face, but thanks is owed to Allen for being reasonably kind when pointing it out to me.

I’m already using TInterlocked.Add (which was a holdover from when this demo was using WaitForAll instead of WaitForAny). I can change that to TInterlocked.CompareExchange, like so:

 
tasks[1] := TTask.Create(procedure
                         begin
                           sleep (5000);
                           if tasks[1].Status <> TTaskStatus.Canceled then
                             TInterlocked.CompareExchange(value, 5000, 0);
                         end);

This says compare Value to zero, and if that returns true, set it to 5000. I make the same change in the other Task, and we’re done.

The main take away from this, I think, is that while the Parallel Programming Library makes it much easier to start parallelising parts of your application, it doesn’t give you a free pass to ignore the potentials for errors that parallelisation can bring. So I’m off to find my old copy of Concurrent Programming On Windows by Joe Duffy, a book Allen recommended during his recent CodeRage 9 session, and brush up a little.

Some Random Dude

“But what happens if…” : The Joy of Race Conditions

Join the Discussion Cancel reply