Re: [PATCH] arm64: PCI: Remove node-local allocations when initialising host controller

From: Bjorn Helgaas
Date: Wed Aug 08 2018 - 13:22:17 EST


On Wed, Aug 08, 2018 at 03:44:03PM +0100, Punit Agrawal wrote:
> Bjorn Helgaas <bhelgaas@xxxxxxxxxx> writes:
> > On Thu, Aug 2, 2018 at 9:33 AM Lorenzo Pieralisi
> > <lorenzo.pieralisi@xxxxxxx> wrote:
> >> On Wed, Aug 01, 2018 at 02:38:51PM -0500, Jeremy Linton wrote:
> >>
> >> Jiang Liu does not work on the kernel anymore so we won't know
> >> anytime soon the reasoning behind commit 965cd0e4a5e5
> >>
> >> > On 08/01/2018 12:31 PM, Punit Agrawal wrote:
> >> > >Memory for host controller data structures is allocated local to the
> >> > >node to which the controller is associated with. This has been the
> >> > >behaviour since support for ACPI was added in
> >> > >commit 0cb0786bac15 ("ARM64: PCI: Support ACPI-based PCI host controller").
> >> >
> >> > Which was apparently influenced by:
> >> >
> >> > 965cd0e4a5e5 x86, PCI, ACPI: Use kmalloc_node() to optimize for performance
> >> >
> >> > Was there an actual use-case behind that change?
> >> >
> >> > I think this fixes the immediate boot problem, but if there is any
> >> > perf advantage it seems wise to keep it... Particularly since x86
> >> > seems to be doing the node sanitation in pci_acpi_root_get_node().
> >>
> >> I am struggling to see the perf advantage of allocating a struct
> >> that the PCI controller will never read/write from a NUMA node that
> >> is local to the PCI controller, happy to be corrected if there is
> >> a sound rationale behind that.
> >
> > If there is no reason to use kzalloc_node() here, we shouldn't use it.
> >
> > But we should use it (or not use it) consistently across arches. I do
> > not believe there is an arch-specific reason to be different.
> > Currently, pci_acpi_scan_root() uses kzalloc_node() on x86 and arm64,
> > but kzalloc() on ia64. They all ought to be the same.
>
> From my understanding, arm64 use of kzalloc_node() was derived from the
> x86 version. Maybe somebody familiar with behaviour on x86 can provide
> input here.

If you want to remove use of kzalloc_node(), I'm fine with that as
long as you do it for x86 at the same time (maybe separate patches,
but at least in the same series).

I don't see any evidence in 965cd0e4a5e5 ("x86, PCI, ACPI: Use
kmalloc_node() to optimize for performance") that it actually improves
performance, so I'd be inclined to just use kzalloc().

Bjorn