Re: [PATCH] try parent numa_node at first before using default

From: Greg KH
Date: Thu Jul 12 2007 - 14:38:18 EST


On Thu, Jul 12, 2007 at 10:59:53AM -0700, Yinghai Lu wrote:
> [PATCH] try parent numa_node at first before using default
>
> For pci_device, pcibios_scan_root and pci_scan_root will call pci_device_add.
> pci_device_add will call device_initialize and set_dev_node(&dev->dev,
> pcibus_to_node(bus)).
> other device such as netdev, and usb_device, set_dev_node is never be
> used. So that field numa_node always is -1.
>
> So for netdev, it will need to use dev->parent to get pci_device to
> use it's numa_node. esp in netdev_alloc_skb()
> not sure how other device such as infiniband do that.
>
> Actually before patch
> [PATCH 1/2] x86_64: get mp_bus_to_node as early
> there is a bug about squence of bus->sysdata and using pcibus_to_node.
> the numa_node of pci_dev->dev is never set correctly...always 0.
>
> So some device have to use pcibus_to_node(to_pci_dev(dev)->bus) directly
> such as dma_alloc_pages in arch/x86_64/kernel/pci-dma.c.
> or hwif_to_node in include/linux/ide.h
>
> According to Stefan Richter
> - Change all subsystems to set dev->parent before device_initialize().
> *Document* that the device_initialize() API has this requirement.
> This is counter-intuitive, amounts to some work across the kernel,
> and could be gotten wrong again in future code because it's a
> counter-intuitive API.
>
> - Move your code from device_initialize() to device_add(). One minor
> drawback is that node-specific allocations based on the device's
> numa_node would not be optimized before device_add(), but there is
> probably no need for this. Driver probes come after device_add().
>
> - Let subsystems explicitly call set_dev_node() on their own.
>
> this patch is using second method.
>
> Also we don't need call set_dev_node in pci_device_add anymore. but need to
> make sure every pci root bus's bridge device numa is set.
>
> with this patch, we could use device->numa_node direclty for all device.

Please split this into two separate patches, as they are doing two
different things. One for the driver core, and one for pci devices.

Also, what kind of performance issues have you seen with this patch in
place? As your previous patches were incorrect, I am guessing that you
are not able to test this?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/