Re: [PATCH v4 08/12] PCI: Introduce /sys/bus/pci/devices/.../remove

From: Andrew Morton
Date: Thu Mar 19 2009 - 05:50:21 EST


On Wed, 18 Mar 2009 16:40:06 -0600 Alex Chiang <achiang@xxxxxx> wrote:

> This patch adds an attribute named "remove" to a PCI device's sysfs
> directory. Writing a non-zero value to this attribute will remove the PCI
> device and any children of it.
>
> Trent Piepho wrote the original implementation and documentation.
>
> Thanks to Vegard Nossum for testing under kmemcheck and finding locking
> issues with the sysfs interface.
>
> ...
>
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -246,6 +246,47 @@ struct bus_attribute pci_bus_attrs[] = {
> __ATTR(rescan, S_IWUSR, NULL, bus_rescan_store),
> __ATTR_NULL
> };
> +
> +static void remove_callback(struct device *dev)
> +{
> + int bridge = 0;
> + struct pci_dev *pdev = to_pci_dev(dev);
> +
> + mutex_lock(&pci_remove_rescan_mutex);
> +
> + if (pdev->subordinate)
> + bridge = 1;
> +
> + pci_remove_bus_device(pdev);
> + if (bridge && list_empty(&pdev->bus->devices))
> + pci_remove_bus(pdev->bus);
> +
> + mutex_unlock(&pci_remove_rescan_mutex);
> +}
> +
> +static ssize_t
> +remove_store(struct device *dev, struct device_attribute *dummy,
> + const char *buf, size_t count)
> +{
> + int ret = 0;
> + unsigned long val;
> + struct pci_dev *pdev = to_pci_dev(dev);
> +
> + if (strict_strtoul(buf, 0, &val) < 0)
> + return -EINVAL;
> +
> + if (!capable(CAP_SYS_ADMIN))
> + return -EPERM;
> +
> + if (pdev->subordinate && pci_is_root_bus(pdev->bus))
> + return -EBUSY;
> +
> + if (val)
> + ret = device_schedule_callback(dev, remove_callback);
> + if (ret)
> + count = ret;
> + return count;
> +}
> #endif

It is very hard for the reader (this one at least) to work out why
device_schedule_callback() is used here, instead of simply doing the work
directly.

The way to solve that problem is to add a code comment.

Given that we're in a sysfs write() handler where no relevant locks at all
are held, it seems rather weird that we cannot perform this operation
synchronously, but no doubt the comment will explain all of this.

Do we need the CAP_SYS_ADMIN check if the sysfs file permissions are
correct? (I keep on asking this then forgetting the answer).

The device_schedule_callback() thing exposes us to (I assume) a pile of
races, the most obvious of which is "what locking or refcounting keeps
*dev alive?". It would be nice to see an analysis/description of the
lifetime issues here. Perhaps in the changelog, preferably in code
comments.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/