Re: [patch 2/3] MAP_NOZERO - implement sys_brk2()

From: Hugh Dickins
Date: Wed Jun 27 2007 - 13:43:47 EST


On Wed, 27 Jun 2007, Ulrich Drepper wrote:
> On 6/27/07, Hugh Dickins <hugh@xxxxxxxxxxx> wrote:
> > Not so: if an mmap can be done by extending either adjacent vma (prot
> > and flags and file and offset all match up), that's what's done and no
> > separate vma is created. (And adjacent vmas get merged when mprotect
> > removes the difference in protection.)
>
> mmap return values are randomized. If they would be mergable
> something would be wrong.

The absolute virtual addresses are randomized, yes; but do a sequence
of mmaps and they come back adjacent to each other, and so mergable.
Or does your distro include a kernel patch to randomize them relative
to each other?

> > I don't think there's any such reason to prefer brk to mmap.
>
> Talk to the Quadrics people. Some of their interconnect adapters
> incur significant costs with separate VMAs.
>
> > Please let me know if you've a test case which shows more vmas than
> > expected.
>
> Not related to this, but we already have "too many" VMAs for some
> definition of "too many". Since we split VMA when the protection
> changes each thread stack consists at least of two VMA (three for
> ia64). There was an approach at some point to push the access flags
> down and allow VMAs to stretch further. Why was this deemed
> unsuitable?

Blaisorblade's patch, yes. I disliked it at the time I reviewed
it a couple of years back: it made a system call many of us regret
even nastier, and relied on tiresome per-arch pagetable encodings.
It seemed likely to cause us trouble down the line - though
undoubtedly it provides real benefits too.

I never found time to review it when it came around again recently,
thought Blaisorblade might prefer me to turn a blind eye to it ;)
and let others opine instead. But I think noone else found time to
review it either. I expect it's rather nicer now, but doubt it's
actually attractive.

It _might_ turn out to be more attractive, not to rely on that
peculiar sys_remap_file_pages, but make all our vmas independent
of protections, and hang differently protected ranges off them.
Maybe.

> It could have quite a bit with situations with many
> threads (Java). As I've told back when this came up, we had one
> customer where the VMA lookup actually showed up in profiles.

In honesty, I should add that I dislike and distrust Davide's
MAP_NOZERO very much indeed! Would much rather leave my cpus
spending a little time in clear_page(). A uid in struct page
(though I'm sure we could find somewhere to tuck it away) -
the horror, the horror! But I've so far failed to find a killer
argument against it, and am hoping for someone else to do so.

Curmudgeonly,
Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/