Re: [RFC PATCH 3/3] mm/map_contig: Add mmap(MAP_CONTIG) support

From: Michal Hocko
Date: Mon Oct 16 2017 - 04:25:04 EST


On Sun 15-10-17 10:50:29, Guy Shattah wrote:
>
>
> On 13/10/2017 19:17, Michal Hocko wrote:
> > On Fri 13-10-17 10:56:13, Cristopher Lameter wrote:
> > > On Fri, 13 Oct 2017, Michal Hocko wrote:
> > >
> > > > > There is a generic posix interface that could we used for a variety of
> > > > > specific hardware dependent use cases.
> > > > Yes you wrote that already and my counter argument was that this generic
> > > > posix interface shouldn't bypass virtual memory abstraction.
> > > It does do that? In what way?
> > availability of the virtual address space depends on the availability of
> > the same sized contiguous physical memory range. That sounds like the
> > abstraction is gone to large part to me.
>
> In what way? userspace users will still be working with virtual memory.

So you are saying that providing an API which fails randomly because of
the physically fragmented memory is OK? Users shouldn't really care
about the state of the physical memory. That is what we have the virtual
memory for.

> > > > > There are numerous RDMA devices that would all need the mmap
> > > > > implementation. And this covers only the needs of one subsystem. There are
> > > > > other use cases.
> > > > That doesn't prevent providing a library function which could be reused
> > > > by all those drivers. Nothing really too much different from
> > > > remap_pfn_range.
> > > And then in all the other use cases as well. It would be much easier if
> > > mmap could give you the memory you need instead of havig numerous drivers
> > > improvise on their own. This is in particular also useful
> > > for numerous embedded use cases where you need contiguous memory.
> > But a generic implementation would have to deal with many issues as
> > already mentioned. If you make this driver specific you can have access
> > control based on fd etc... I really fail to see how this is any
> > different from remap_pfn_range.
> Why have several driver specific implementation if you can generalize the
> idea and implement
> an already existing POSIX standard?

Because users shouldn't really care, really. We do have means to get
large memory and having a guaranteed large memory is a PITA. Just look
at hugetlb and all the issues it exposes. And that one is preallocated
and it requires admin to do a conscious decision about the amount of the
memory. You would like to establish something similar except without
bounds to the size and no pre-allowed amount by an admin. This sounds
just crazy to me.

On the other hand if you make this per-device mmap implementation you
can have both admin defined policy on who is allowed this memory and
moreover drivers can implement their fallback strategies which best suit
their needs. I really fail to see how this is any different from using
specialized mmap implementations.

I might be really wrong but I consider such a general purpose flag quite
dangerous and future maintenance burden. At least from the hugetlb/THP
history I do not see why this should be any different.
--
Michal Hocko
SUSE Labs