Re: [PATCH v1] driver core: Allow device link operations inside sync_state()

From: Rafael J. Wysocki
Date: Thu Nov 14 2019 - 04:13:51 EST


On Wed, Nov 13, 2019 at 3:36 AM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
>
> Some sync_state() implementations might need to call APIs that in turn
> make calls to device link APIs. So, do the sync_state() callbacks
> without holding the device link lock.
>
> Signed-off-by: Saravana Kannan <saravanak@xxxxxxxxxx>
> ---
> drivers/base/core.c | 63 +++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 55 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index e6d3e6d485da..d396b0597c10 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -48,6 +48,8 @@ early_param("sysfs.deprecated", sysfs_deprecated_setup);
> static LIST_HEAD(wait_for_suppliers);
> static DEFINE_MUTEX(wfs_lock);
> static LIST_HEAD(deferred_sync);
> +static LIST_HEAD(sync_list);
> +static DEFINE_MUTEX(sync_lock);
> static unsigned int defer_sync_state_count = 1;
>
> #ifdef CONFIG_SRCU
> @@ -695,7 +697,23 @@ int device_links_check_suppliers(struct device *dev)
> return ret;
> }
>
> -static void __device_links_supplier_sync_state(struct device *dev)
> +/** __device_links_queue_sync_state - Queue a device for sync_state() callback
> + * @dev: Device to call sync_state() on
> + *
> + * Queues a device for a sync_state() callback when the device links write lock
> + * isn't held. This allows the sync_state() execution flow to use device links
> + * APIs. The caller must ensure this function is called with
> + * device_links_write_lock() held.
> + *
> + * This function does a get_device() to make sure the device is not freed while
> + * on this list.
> + *
> + * So the caller must also ensure that device_links_flush_sync_list() is called
> + * as soon as the caller releases device_links_write_lock(). This is necessary
> + * to make sure the sync_state() is called in a timely fashion and the
> + * put_device() is called on this device.
> + */
> +static void __device_links_queue_sync_state(struct device *dev)
> {
> struct device_link *link;
>
> @@ -709,12 +727,35 @@ static void __device_links_supplier_sync_state(struct device *dev)
> return;
> }
>
> - if (dev->bus->sync_state)
> - dev->bus->sync_state(dev);
> - else if (dev->driver && dev->driver->sync_state)
> - dev->driver->sync_state(dev);
> -
> dev->state_synced = true;
> +
> + mutex_lock(&sync_lock);

Total nit: I add empty lines around lock/unlock as a rule to make them
more visible.

> + WARN_ON(!list_empty(&dev->links.defer_sync));
> + if (list_empty(&dev->links.defer_sync)) {

Do you really need to duplicate that check?

> + get_device(dev);
> + list_add_tail(&dev->links.defer_sync, &sync_list);
> + }
> + mutex_unlock(&sync_lock);
> +}

What about adding

} else {
WARN_ON(1);
}

here instead?

> +

Kerneldoc?

> +static void device_links_flush_sync_list(void)
> +{
> + struct device *dev, *tmp;
> +
> + mutex_lock(&sync_lock);
> +
> + list_for_each_entry_safe(dev, tmp, &sync_list, links.defer_sync) {
> + list_del_init(&dev->links.defer_sync);
> + device_lock(dev);
> + if (dev->bus->sync_state)
> + dev->bus->sync_state(dev);
> + else if (dev->driver && dev->driver->sync_state)
> + dev->driver->sync_state(dev);
> + device_unlock(dev);
> + put_device(dev);
> + }
> +
> + mutex_unlock(&sync_lock);
> }
>
> void device_links_supplier_sync_state_pause(void)
> @@ -738,11 +779,16 @@ void device_links_supplier_sync_state_resume(void)
> goto out;
>
> list_for_each_entry_safe(dev, tmp, &deferred_sync, links.defer_sync) {
> - __device_links_supplier_sync_state(dev);
> + /*
> + * Delete from deferred_sync list before queuing it to
> + * sync_list because defer_sync is used for both lists.
> + */
> list_del_init(&dev->links.defer_sync);
> + __device_links_queue_sync_state(dev);
> }
> out:
> device_links_write_unlock();
> + device_links_flush_sync_list();

Wouldn't it be better to use a local list in this function instead of
the global sync_list?

I guess the idea is that you wouldn't be able to do the flush in
device_links_driver_bound() below, but do you really need that flush?

It looks like this is the only place calling
__device_links_queue_sync_state() and you do a flush right away after
the loop, so why is the extra flush in device_links_driver_bound()
needed?

> }
>
> static int sync_state_resume_initcall(void)
> @@ -815,12 +861,13 @@ void device_links_driver_bound(struct device *dev)
> if (defer_sync_state_count)
> __device_links_supplier_defer_sync(link->supplier);
> else
> - __device_links_supplier_sync_state(link->supplier);
> + __device_links_queue_sync_state(link->supplier);
> }
>
> dev->links.status = DL_DEV_DRIVER_BOUND;
>
> device_links_write_unlock();
> + device_links_flush_sync_list();

It looks like devices can be added to sync_list in parallel with each
other and so is it always OK to always flush all of them after one of
them has been bound to a driver?

> }
>
> static void device_link_drop_managed(struct device_link *link)
> --