[RFC] [patch 0/18] remap_file_pages protection support (for UML), try 3

From: Blaisorblade
Date: Fri Aug 26 2005 - 13:31:03 EST


This is a followup to my post of last week (Aug 12) about remap_file_pages
protection support. I've improved and consolidated the patches and updated
them against 2.6.13-rc6/rc7 (the same patches apply against both versions).
I'm sending the full patch series only to akpm, mingo and LKML.

I've also reduced them to only 18, and made the splitting more significant.
I'm not resending all the patches for foreign architectures, because they're
almost unchanged since last time (there's just a trivial reject from ppc32,
because one change has already been done after -rc4).

I'm working on this to provide support for UML, which currently easily creates
more than 64K (the default limit) vma's for a single process. Actually, it
needs one VMA per each page. So, with this patch and specific UML support,
which Ingo wrote and which I'm porting to recent UMLs.

Some highlights:

* The first 2 patches modify the PTE encoding macros and start preparing the
VM for the new situation (i.e. VMA which have variable protections, which are
called VM_NONUNIFORM. I dropped the early VM_MANYPROTS name).

Patch number 2 will require fixing up all arches like in 2.6.4-rc2-mm1, to
provide the new PTE encoding macros.

* Patch 5 allows the syscall to actually create such VMAs. Before that,
there's no difference in behaviour with the current kernel (except that
there's less space for file offset encoding in PTEs). And even here, the new
operations are only enabled for arch explicitly supporting it (see patch #7).

* Patch 8 and 9 change the path for handling page faults, since the permission
checking on nonuniform vmas cannot be done until the PTE entry has been read.

This is the most intrusive part, but
a) archs are not required to adequate to this immediately
b) it isn't so difficult in practice.

* Patch 11 is a big simplification. Since we must encode the PTE's on swapout
like in VM_NONLINEAR vmas, the simplest way to reuse the existing code is to
make sure that VM_NONUNIFORM vmas are also marked as VM_NONLINEAR.

It is possible to avoid this, as in patch #18, but it's just a bit scary, and

Then there are 4 optimization patches and 3 fixups for some odd cases that we
maybe won't support. They are namely:
*) vmas with default PROT_NONE protection (I actually feel we're going to
support this, the only patch which has problems is an optimization)

*) MAP_POPULATE on private VMA (no problem on this) and consequently
remap_file_pages on private VMA to install linear uniform mappings (since
MAP_POPULATE is implemented in terms of remap_file_pages): there's a patch to
stop this from truncating COW pages away, but I don't think it's worth it.

*) linear nonuniform vmas. I initially created them because there's no
relation between being nonlinear and nonuniform, but it later turned out
supporting them is intrusive.

I have improved even more the patches, and understood better some changes from
Ingo which I didn't last time, and fixed their bugs.

I hope these changes can be reviewed, and included inside -mm, even if they'll
conflict with pagefault scalability patches (even if I think the conflicts
are not difficult to solve).

Still, the patch is IMHO in better shape, in many ways, than when it was in
-mm last time. To handle properly all possibilities it has become a bit more
intrusive.

The original one was designed to handle only the simpler needs of
UML (an mmap'ing with PROT_NONE followed by nonlinear and nonuniform
remappings), but it still failed in some cases. I've taken original Ingo's
test-program and significantly extended it, it's attached to this patch.

I'll appreciate any comments.

==============
Changes from 2.6.5-mm1/dropped version of the patches:
==============
*) Actually implemented _real_ and _anal_ protection support, safe against
swapout; programs get SIGSEGV *always* when they should. I've used the
attached test program (an improved version of Ingo's one) to check that.
I tested just until patch 25, onto UML. The subsequent ones are either patches
for foreign archs or proposed

*) Fixed many changes present in the patches.
*) Fixed UML bits
*) Added some headaches for arches ports. I've also included some patches
which reduce this.

*) No more usage of a new syscall slot: to use the new interface, application
will use the new MAP_NOINHERIT flag I've added. I've still the patches to use
the old -mm ABI, if there's any reason they're needed.

*) Fixed a regression wrt using mprotect() against remapped area (see patch
15)

======
Changes from my last patch-bomb of the patches:
======
*) fixed mprotect VS remap_file_pages(MAP_NOINHERIT) interaction

*) fixed truncation (with madvise_dontneed or truncate()) of nonuniform but
linear vmas. Either with patch 11, by removing "nonuniform but linear VMAs",
or with patch 18.

======
Still todo
======
*) ->populate flushes each TLB individually, instead of using mmu_gathers as
it should; this was suggested even by Ingo when sending the patch, but it
seems he didn't get the time to finish this. And I'm now wondering how would
that relate with I/O... at each I/O point we should finish and regather the
mmu_gather, as in zap_page_range. But here we are reading pages, not the
reverse!

Seems rewriting the kernel locking is a quite time-consuming task!
--
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade

Attachment: fremap-test-complete.c.bz2
Description: BZip2 compressed data