Re: Possible regression with cgroups in 3.11

From: Yinghai Lu
Date: Mon Nov 18 2013 - 14:29:40 EST


On Mon, Nov 18, 2013 at 10:14 AM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote:
>> A bit of comment here would be nice but yeah I think this should work.
>> Can you please also queue the revert of c2fda509667b ("workqueue:
>> allow work_on_cpu() to be called recursively") after this patch?
>> Please feel free to add my acked-by.
>
> OK, below are the two patches (Alex's fix + the revert) I propose to
> merge. Unless there are objections, I'll ask Linus to pull these
> before v3.13-rc1.
>
>
>
> commit 84f23f99b507c2c9247f47d3db0f71a3fd65e3a3
> Author: Alexander Duyck <alexander.h.duyck@xxxxxxxxx>
> Date: Mon Nov 18 10:59:59 2013 -0700
>
> PCI: Avoid unnecessary CPU switch when calling driver .probe() method
>
> If we are already on a CPU local to the device, call the driver .probe()
> method directly without using work_on_cpu().
>
> This is a workaround for a lockdep warning in the following scenario:
>
> pci_call_probe
> work_on_cpu(cpu, local_pci_probe, ...)
> driver .probe
> pci_enable_sriov
> ...
> pci_bus_add_device
> ...
> pci_call_probe
> work_on_cpu(cpu, local_pci_probe, ...)
>
> It would be better to fix PCI so we don't call VF driver .probe() methods
> from inside a PF driver .probe() method, but that's a bigger project.
>
> [bhelgaas: disable preemption, open bugzilla, rework comments & changelog]
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=65071
> Link: http://lkml.kernel.org/r/CAE9FiQXYQEAZ=0sG6+2OdffBqfLS9MpoN1xviRR9aDbxPxcKxQ@xxxxxxxxxxxxxx
> Link: http://lkml.kernel.org/r/20130624195942.40795.27292.stgit@xxxxxxxxxxxxxxxxxxxxxxxx
> Signed-off-by: Alexander Duyck <alexander.h.duyck@xxxxxxxxx>
> Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> Acked-by: Tejun Heo <tj@xxxxxxxxxx>

Tested-by: Yinghai Lu <yinghai@xxxxxxxxxx>
Acked-by: Yinghai Lu <yinghai@xxxxxxxxxx>

>
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 9042fdbd7244..add04e70ac2a 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -288,12 +288,24 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
> int error, node;
> struct drv_dev_and_id ddi = { drv, dev, id };
>
> - /* Execute driver initialization on node where the device's
> - bus is attached to. This way the driver likely allocates
> - its local memory on the right node without any need to
> - change it. */
> + /*
> + * Execute driver initialization on node where the device is
> + * attached. This way the driver likely allocates its local memory
> + * on the right node.
> + */
> node = dev_to_node(&dev->dev);
> - if (node >= 0) {
> + preempt_disable();
> +
> + /*
> + * On NUMA systems, we are likely to call a PF probe function using
> + * work_on_cpu(). If that probe calls pci_enable_sriov() (which
> + * adds the VF devices via pci_bus_add_device()), we may re-enter
> + * this function to call the VF probe function. Calling
> + * work_on_cpu() again will cause a lockdep warning. Since VFs are
> + * always on the same node as the PF, we can work around this by
> + * avoiding work_on_cpu() when we're already on the correct node.
> + */
> + if (node >= 0 && node != numa_node_id()) {
> int cpu;
>
> get_online_cpus();
> @@ -305,6 +317,8 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
> put_online_cpus();
> } else
> error = local_pci_probe(&ddi);
> +
> + preempt_enable();
> return error;
> }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/