Re: [RFC][PATCH] timers: Add del_time_free() to be called before freeing timers

From: Thomas Gleixner
Date: Fri Apr 08 2022 - 16:30:06 EST


On Fri, Apr 08 2022 at 07:33, Linus Torvalds wrote:
> On Fri, Apr 8, 2022 at 12:37 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>>
>> So this would become:
>>
>> - BUG_ON(!timer->function);
>> + if (WARN_ON(!timer->function))
>> + return -EBROKEN;
>
> Yes. But please make it a WARN_ON_ONCE(), just on basic principles. I
> can't imagine this happening a lot, but at the same time I don't think
> there's any reason _not_ to just always use WARN_ON_ONCE() for these
> kinds of "serious bug, but should never happen" situations.
>
> Because we don't want some "user can trigger this and spam the logs"
> situation either.

Fair enough.

> That said, I would actually prefer a name-change: instead of making
> this about "del_timer_free()", can we please just make this about it
> being "final". Or maybe "del_timer_cancel()" or something like that?

I was anyway thinking to use the timer_* namespace if we want to go for
this.

timer_shutdown() perhaps?

> Because the actual _freeing_ will obviously happen later, and the
> function does nothing of the sort. In fact, there may be situations
> where you don't free it at all, but just want to be in the situation
> where you want to make sure there are no pending timers until after
> you explicitly re-arm it, even if the timer would otherwise be
> self-arming.
>
> (That use-case would actually mean removing the WARN_ON_ONCE(), but I
> think that would be a "future use" issue, I'm *not* suggesting it not
> be done initially).

Well, you'd have to reinitialize it first before the explicit rearm
because the shutdown cleared the function pointer or if we use a flag
then the flag would prevent arming it again.

> I also suspect that 99% of all del_timer_sync() users actually want
> that kind of explicit "del_timer_final()" behavior. Some may not
> _need_ it (because their re-arming already checks for "have I shut
> down?"), but I have this suspicion that we could convert a lot - maybe
> all - of the current del_timer_sync() users to this and try to see if
> we could just make it the rule.

Hmm. That would mean, that we still check the function pointer for NULL
without warning and just return. That would indeed be a good argument
for not having the warning at all.

But, there is a twist. While most callsites ignore the return value of
mod_timer() there are 39 (about 2%) which actually care. So that needs
some thought. Though code which cares about these details is mostly
networking core code and not the random driver thing. Though there are a
few suspects in drivers too including bluetooth, but the latter is what
started this whole discussion. See below.

> And then we could actually maybe start removing the explicit "shut
> down timer" code. See for example "work->canceling" logic for
> kthreads, which now does double duty (it disables re-arming the timer,
> _and_ it does some "double cancel work avoidance")

This serializes the cancel against _any_ queuing of that work including
the queueing from an already running timer callback which is blocked on
the worker lock. The shutdown thing wont help there I think.

The problem where it could help is shutdown code for timers plus other
entities with circular dependencies. The problem which started this was
some bluetooth thing which has two timers and a workqueue. The timers
can queue work and the workqueue can arm timers....

So well written drivers have a priv->shutdown flag which makes timer
callbacks and workqueue functions aware that a shutdown is in progress
so they can take appropriate action. That's not necessarily trivial and
I've decoded my share of subtle problems in that realm over the years.

The bluetooth thing in question does not fall into that category, it
even just used del_timer() before destroying the work queue.

What a shutdown function would prevent here is UAF, but I'm not entirely
sure whether it will simplify coordinated shutdown and remove the
requirement of a priv->shutdown flag all over the place. It might make
some of the driver muck just get stuck in the shutdown, but that's
definitely an improvement over a potential UAF which happens every blue
moons.

Thanks,

tglx