Re: [PATCH] driver core: fix shutdown races with probe/remove

From: Alan Stern
Date: Tue Jun 05 2012 - 13:09:41 EST


On Tue, 5 Jun 2012, Ming Lei wrote:

> On Tue, Jun 5, 2012 at 10:47 PM, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> > Avoid running probe, that's fine.  But avoiding remove can lead to
> > problems, because the subsystem and the driver will no longer agree on
> > who should manage the device.
>
> After device_shutdown() has been called, the whole system will enter power down
> or reset later, so it doesn't matter if who should manage the device.

Maybe. But there might be quite some time between the shutdown call
and the eventual power-off or reboot.

> Also once shutdown callback is called for the device, looks its other callbacks
> should not touch the device any more.

You shouldn't depend on that. Shutdown methods generally put the
device into a state suitable for power-off or reboot; they don't often
guarantee that the driver won't change the state later on.

On the whole, it might be easier just to hold the device lock during
the shutdown call.

> >> @@ -1803,6 +1806,9 @@ void device_shutdown(void)
> >>  {
> >>       struct device *dev;
> >>
> >> +     ACCESS_ONCE(device_shutdown_started) = 1;
> >
> > This does not need to be ACCESS_ONCE.  ACCESS_ONCE is needed only in
> > situations where the compiler might insert multiple reads or writes.
>
> I have talked with Paul about the ACCESS_ONCE, and Paul thought it is needed
> to prevent the compiler from doing byte-at-a-time stores.

Why on earth would a compiler change this into a byte-at-a-time store?
And even if it did, what problem would that cause? Only one of the
bytes would be different from the original value.

> Maybe Paul has a detailed explanation about it.

> >>  static int really_probe(struct device *dev, struct device_driver *drv)
> >>  {
> >>       int ret = 0;
> >> +     int idx;
> >>
> >> +     idx = srcu_read_lock(&driver_srcu);
> >>       atomic_inc(&probe_count);
> >> +     if (ACCESS_ONCE(device_shutdown_started))
> >
> > This doesn't need to be ACCESS_ONCE either.  If it would make you feel
>
> Because the global variable of device_shutdown_started is not changed inside
> the function, the compiler may put the above line after the line of
> 'dev->driver = drv',

No, it can't do that. If the compiler moved this read farther down,
and as a result the CPU wrote to dev->driver when it shouldn't have,
the result would be incorrect (i.e., different from what would have
happened if the statement hadn't been moved). Hence the compiler is
prohibited from making such an optimization.

> so introduce the ACCESS_ONCE to avoid the compiler optimization.
>
> > better, you could write
> >
> >        if (rcu_dereference(&device_shutdown_started))
>
> The usage of SRCU refers to the part 'Cleaning Up Safely' of the SRCU
> paper 'Sleepable RCU'[1]. Also the patch doesn't call rcu_assign_pointer,
> so rcu_dereference isn't needed.

True, it's not needed any more than ACCESS_ONCE is.

> > As mentioned above, this is not good.  I don't know... maybe it won't
> > ever cause any difficulties.
>
> Looks same with above.

You could try going into an infinite loop here, but that's probably not
a good idea.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/