Re: [RFC PATCH v4 11/13] mm: parallelize deferred struct page initialization within each node

From: Daniel Jordan
Date: Mon Nov 12 2018 - 11:54:45 EST


On Sat, Nov 10, 2018 at 03:48:14AM +0000, Elliott, Robert (Persistent Memory) wrote:
> > -----Original Message-----
> > From: linux-kernel-owner@xxxxxxxxxxxxxxx <linux-kernel-
> > owner@xxxxxxxxxxxxxxx> On Behalf Of Daniel Jordan
> > Sent: Monday, November 05, 2018 10:56 AM
> > Subject: [RFC PATCH v4 11/13] mm: parallelize deferred struct page
> > initialization within each node
> >
> > ... The kernel doesn't
> > know the memory bandwidth of a given system to get the most efficient
> > number of threads, so there's some guesswork involved.
>
> The ACPI HMAT (Heterogeneous Memory Attribute Table) is designed to report
> that kind of information, and could facilitate automatic tuning.
>
> There was discussion last year about kernel support for it:
> https://lore.kernel.org/lkml/20171214021019.13579-1-ross.zwisler@xxxxxxxxxxxxxxx/

Thanks for bringing this up. I'm traveling but will take a closer look when I
get back.

> > In testing, a reasonable value turned out to be about a quarter of the
> > CPUs on the node.
> ...
> > + /*
> > + * We'd like to know the memory bandwidth of the chip to
> > calculate the
> > + * most efficient number of threads to start, but we can't.
> > + * In testing, a good value for a variety of systems was a
> > quarter of the CPUs on the node.
> > + */
> > + nr_node_cpus = DIV_ROUND_UP(cpumask_weight(cpumask), 4);
>
>
> You might want to base that calculation on and limit the threads to
> physical cores, not hyperthreaded cores.

Why? Hyperthreads can be beneficial when waiting on memory. That said, I
don't have data that shows that in this case.