Re: [PATCH -V2 2/2] autonuma: Migrate on fault among multiple bound nodes

From: Huang, Ying
Date: Fri Nov 06 2020 - 02:29:13 EST


Mel Gorman <mgorman@xxxxxxx> writes:

> On Wed, Nov 04, 2020 at 01:36:58PM +0800, Huang, Ying wrote:
>> But from another point of view, I suggest to remove the constraints of
>> MPOL_F_MOF in the future. If the overhead of AutoNUMA isn't acceptable,
>> why not just disable AutoNUMA globally via sysctl knob?
>>
>
> Because it's a double edged sword. NUMA Balancing can make a workload
> faster while still incurring more overhead than it should -- particularly
> when threads are involved rescanning the same or unrelated regions.
> Global disabling only really should happen when an application is running
> that is the only application on the machine and has full NUMA awareness.

Got it. So NUMA Balancing may in generally benefit some workloads but
hurt some other workloads on one machine. So we need a method to
enable/disable NUMA Balancing for one workload. Previously, this is
done via the explicit NUMA policy. If some explicit NUMA policy is
specified, NUMA Balancing is disabled for the memory region or the
thread. And this can be reverted again for a memory region via
MPOL_MF_LAZY. It appears that we lacks MPOL_MF_LAZY for the thread yet.

>> > It might still end up being better but I was not aware of a
>> > *realistic* workload that binds to multiple nodes
>> > deliberately. Generally I expect if an application is binding, it's
>> > binding to one local node.
>>
>> Yes. It's not popular configuration for now. But for the memory
>> tiering system with both DRAM and PMEM, the DRAM and PMEM in one socket
>> will become 2 NUMA nodes. To avoid too much cross-socket memory
>> accessing, but take advantage of both the DRAM and PMEM, the workload
>> can be bound to 2 NUMA nodes (DRAM and PMEM).
>>
>
> Ok, that may lead to unpredictable performance as it'll have variable
> performance with limited control of the "important" applications that
> should use DRAM over PMEM. That's a long road but the step is not
> incompatible with the long-term goal.

Yes. Ben Widawsky is working on a patchset to make it possible to
prefer the remote DRAM instead of the local PMEM as follows,

https://lore.kernel.org/linux-mm/20200630212517.308045-1-ben.widawsky@xxxxxxxxx/

Best Regards,
Huang, Ying