Re: [PATCH] mm: numa: Do not trap faults on shared data section pages.

From: Christopher Lameter
Date: Fri Jan 19 2018 - 21:13:10 EST


On Thu, 18 Jan 2018, Henry Willard wrote:

> If MPOL_MF_LAZY were allowed and specified things would not work
> correctly. change_pte_range() is unaware of and canât honor the
> difference between MPOL_MF_MOVE_ALL and MPOL_MF_MOVE.

Not sure how that relates to what I said earlier... Sorry.

>
> For the case of auto numa balancing, it may be undesirable for shared
> pages to be migrated whether they are also copy-on-write or not. The
> copy-on-write test was added to restrict the effect of the patch to the
> specific situation we observed. Perhaps I should remove it, I donât
> understand why it would be desirable to modify the behavior via sysfs.

I think the most common case of shared pages occurs for pages that contain
code. In that case a page may be mapped into hundreds if not thousands of
processes. In particular that is often the case for basic system libraries
like the c library which may actually be mapped into every binary that is
running.

It is very difficult and expensive to unmap these pages from all the
processes in order to migrate them. So some sort of limit would be useful
to avoid unnecessary migration attempts. One example would be to forbid
migrating pages that are mapped in more than 5 processes. Some sysctl know
would be useful here to set the boundary.

Your patch addresses a special case here by forbidding migration of any
page mapped by more than a single process (mapcount !=1).

That would mean f.e. that the complete migration of a set of processes
that rely on sharing data via a memory segment is impossible because those
shared pages can never be moved.

By setting the limit higher that migration would still be possible.

Maybe we can set that limit by default at 5 and allow a higher setting
if users have applications that require a higher mapcoun? F.e. a
common construct is a shepherd task and N worker threads. If those
tasks each have their own address space and only communicate via
a shared data segment then one may want to set the limit higher than N
in order to allow the migration of the group of processes.