Re: [RFC][PATCH] sched: Fix TASK_DEAD race in finish_task_switch()

From: Peter Zijlstra
Date: Tue Sep 29 2015 - 12:56:08 EST


On Tue, Sep 29, 2015 at 12:40:22PM -0400, Linus Torvalds wrote:
> On Tue, Sep 29, 2015 at 8:45 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > + *
> > + * Pairs with the control dependency and rmb in try_to_wake_up().
> > */
>
> So this comment makes me nervous. A control dependency doesn't
> actually do anything on powerpc and ARM (or alpha, or MIPS, or any
> number of other architectures. Basically, a conditional branch ends up
> not being the usual kind of data dependency (which works on everything
> but alpha), because conditional branches are predicted and loads after
> them are speculated.

The control dependency creates a LOAD->STORE order, that is, no STOREs
can happen until we observe !p->on_cpu.

while (p->on_cpu)
cpu_relax();

The subsequent:

smp_rmb();

ensures no loads can pass up before the ->on_cpu load. Combined they
provide a LOAD->(LOAD/STORE) barrier.

Nothing is allowed to pass up over that combination -- not unlike an
smp_load_acquire() but without the expensive MB.

> Also, using smp_store_release() instead of a wmb() is going to be very
> expensive on old ARM and a number of other not-so-great architectures.

Yes :-(

> On x86, both end up being just a scheduling thing. On other modern
> architectures, store releases are fairly cheap, but wmb() is cheap
> too.

Right, but wmb isn't sufficient as it doesn't order the prev->state LOAD
vs the prev->on_cpu = 0 STORE. If those happen in the wrong order the
described race can happen and we get a use-after-free.

Of course, x86 isn't affected (the reorder is disallowed by TSO) nor is
PPC (its wmb() is lwsync which also disallows this reorder).

But ARM/MIPS etc.. are affected and these are the archs now getting the
full barrier.

(And note that arm64 gets to use their store-release instruction, which
might be better than the full barrier -- but maybe not as great as their
wmb, I have no idea on their relative costs.)

> So long-term, the wmb->store_release conversion probably makes sense,
> but it's at least debatable for now.

I'm all open to alternative solutions to this race.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/