Re: [PATCH v4 2/6] can: etas_es58x: add devlink support

From: Alan Stern
Date: Mon Nov 28 2022 - 10:50:45 EST


On Mon, Nov 28, 2022 at 02:32:23PM +0900, Vincent MAILHOL wrote:
> On Mon. 28 Nov. 2022 at 10:34, Vincent MAILHOL
> <mailhol.vincent@xxxxxxxxxx> wrote:
> > On Mon. 28 Nov. 2022 at 00:41, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
> > > On Sun, Nov 27, 2022 at 02:10:32PM +0900, Vincent MAILHOL wrote:
> > > > > Should devlink_free() be after usb_set_inftdata()?
> > > >
> > > > A look at
> > > > $ git grep -W "usb_set_intfdata(.*NULL)"
> > > >
> > > > shows that the two patterns (freeing before or after
> > > > usb_set_intfdata()) coexist.
> > > >
> > > > You are raising an important question here. usb_set_intfdata() does
> > > > not have documentation that freeing before it is risky. And the
> > > > documentation of usb_driver::disconnect says that:
> > > > "@disconnect: Called when the interface is no longer accessible,
> > > > usually because its device has been (or is being) disconnected
> > > > or the driver module is being unloaded."
> > > > Ref: https://elixir.bootlin.com/linux/v6.1-rc6/source/include/linux/usb.h#L1130
> > > >
> > > > So the interface no longer being accessible makes me assume that the
> > > > order does not matter. If it indeed matters, then this is a foot gun
> > > > and there is some clean-up work waiting for us on many drivers.
> > > >
> > > > @Greg, any thoughts on whether or not the order of usb_set_intfdata()
> > > > and resource freeing matters or not?
> > >
> > > In fact, drivers don't have to call usb_set_intfdata(NULL) at all; the
> > > USB core does it for them after the ->disconnect() callback returns.
> >
> > Interesting. This fact is widely unknown, cf:
> > $ git grep "usb_set_intfdata(.*NULL)" | wc -l
> > 215
> >
> > I will do some clean-up later on, at least for the CAN USB drivers.
> >
> > > But if a driver does make the call, it should be careful to ensure that
> > > the call happens _after_ the driver is finished using the interface-data
> > > pointer. For example, after all outstanding URBs have completed, if the
> > > completion handlers will need to call usb_get_intfdata().
> >
> > ACK. I understand that it should be called *after* the completion of
> > any ongoing task.
> >
> > My question was more on:
> >
> > devlink_free(priv_to_devlink(es58x_dev));
> > usb_set_intfdata(intf, NULL);
> >
> > VS.
> >
> > usb_set_intfdata(intf, NULL);
> > devlink_free(priv_to_devlink(es58x_dev));
> >
> > From your comments, I understand that both are fine.
>
> Do we agree that the usb-skeleton is doing it wrong?
> https://elixir.bootlin.com/linux/latest/source/drivers/usb/usb-skeleton.c#L567
> usb_set_intfdata(interface, NULL) is called before deregistering the
> interface and terminating the outstanding URBs!

Going through the usb-skeleton.c source code, you will find that
usb_get_intfdata() is called from only a few routines:

skel_open()
skel_disconnect()
skel_suspend()
skel_pre_reset()
skel_post_reset()

Of those, all but the first are called only by the USB core and they are
mutually exclusive with disconnect processing (except for
skel_disconnect() itself, of course). So they don't matter.

The first, skel_open(), can be called as a result of actions by the
user, so the driver needs to ensure that this can't happen after it
clears the interface-data pointer. The user can open the device file at
any time before the minor number is given back, so it is not proper to
call usb_set_intfdata(interface, NULL) before usb_deregister_dev() --
but the driver does exactly this!

(Well, it's not quite that bad. skel_open() does check whether the
interface-data pointer value it gets from usb_get_intfdata() is NULL.
But it's still a race.)

So yes, the current code is wrong. And in fact, it will still be wrong
even after the usb_set_intfdata(interface, NULL) line is removed,
because there is no synchronization between skel_open() and
skel_disconnect(). It is possible for skel_disconnect() to run to
completion and the USB core to clear the interface-data pointer all
while skel_open() is running. The driver needs a static private mutex
to synchronize opens with unregistrations. (This is a general
phenomenon, true of all drivers that have a user interface such as a
device file.)

The driver _does_ have a per-instance mutex, dev->io_mutex, to
synchronize I/O with disconnects. But that's separate from
synchronizing opens with unregistrations, because at open time the
driver doesn't yet know the address of the private data structure or
even if the structure is still allocated. So obviously it can't use a
mutex that is embedded within the private data structure for this
purpose.

Alan Stern