Re: kernel timer races

From: Andrew Morton (andrewm@uow.edu.au)
Date: Fri May 26 2000 - 23:37:04 EST


Andrey Savochkin wrote:
>
> The places where handler kfree's the timer and the timer can be deleted by
> del_timer_sync, would be very broken.

Yes, once you've decided to kfree the timer in the handler, then usually
that's the _only_ sane place to free it, and the mainline shouldn't try
to delete it.

I've found a couple of handlers which kfree the timer (and do a
MOD_DEC_USE_COUNT):

        net/appletalk/ddp.c:atalk_destroy_timer()

Nice code and quite safe.

        net/core/sock.c:sklist_destroy_timer()

Same trick: self-destroying skbuff.

Also net/appletalk/aarp.c frees skbuffs, but not the controlling timer.

There may be others.

> ...
>
> In you spintimers.txt I see only one fishy spot with this respect:
> ./net/ipv6/reassembly.c. It looks like it's intrinsicly broken.

mm.. Need to go through many things like this very carefully. My
approach will be to, when in doubt, leave the code using del_timer_async
(ie: unchanged) and then to work through the questionable paths over the
next few weeks.

> > Sure. But del_timer() + wait_on_timers() + dont-re-add-flag ==
> > del_timer_sync()?
>
> It's more than del_timer_sync() because it ensures that the code segment
> isn't used by timer handler, and, thus, is suitable for module cleanups.

Could you please explain this a little more? I don't see it.

mainline
========

andrews_del_timer()
MOD_DEC_USE_COUNT
 
This is safe, as long as the handler doesn't do a MOD_DEC??

Were you referring to the existing (2.3.99) del_timer_sync()?

> I don't insist on any of the options.
> But you have certainly started a hard and painful work :-)

uh-oh.

> I've looked more precisely over your timer-race-6 patch.
> It has not so big overhead as I expected for the first time.
> However, I think it has a potential problem. In del_timer() you drop the
> spinlock, wait for the handler completion, and then acquire the lock again
> and check if the timer has been added again. It doesn't look very robust.

You are correct.

But remember that this whole code path is a one-in-a-million thing. So
having another one-in-a-million race within it has improved things by,
umm, a million. Plus if it ever does happen, we blurt out a warning,
and the end result is the same as what we have now.

But I'll look at adding a 'goto retry' into that code path. Just have
to work out how to test it...

> You relies on what may happen and what may not in the period when you do not
> hold the lock. What if the timer has been readded by the handler and then
> successfully deleted on third CPU?

Right. So instead of having:

  if (!timer_pending(timer))
      printk("That's wierd. It isn't pending!\n");
  ret += detach_timer(timer);
  timer->list.next = timer->list.prev = 0;

it should be:

  if (!timer_pending(timer)) {
      printk("That's wierd. It isn't pending!\n");
  } else {
      ret += detach_timer(timer);
      timer->list.next = timer->list.prev = 0;
  }

> I suggest to assume that if del_timer is called than the timer cannot be free'd
> or destroyed. If it isn't true, there will be race conditions in any case.
> The assumption will simplify things a lot.

That is the assumption which the existing del_timer_sync() makes.

Having recently looked at a zillion timer handlers I believe this
assumption is safe. But adding timer_exit() to every timer handler is
unattractive for obvious reasons (ditto the reinsert-protection flag).
The existing del_timer_sync also raises the possibility of the module
being unloaded prior to the handler returning, but this is so
ridiculously improbable that even I wouldn't make a fuss about it.

> What do you think about it?

I think I need to think some more :) Fun stuff.

I hereby do solemnly swear to write Documentation/kernel-timers.txt!

-- 
-akpm-

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed May 31 2000 - 21:00:17 EST