Re: [RFC PATCH 0/4] Support vranges on files

From: John Stultz
Date: Mon Apr 08 2013 - 20:36:59 EST


On 04/07/2013 05:46 PM, Minchan Kim wrote:
Hello John,

As you know, userland people wanted to handle vrange with mmaped
pointer rather than fd-based and see the SIGBUS so I thought more
about semantic of vrange and want to make it very clear and easy.
So I suggest below semantic(Of course, it's not rock solid).

mvrange(start_addr, lengh, mode, behavior)

It's same with that I suggested lately but different name, just
adding prefix "m". It's per-process model(ie, mm_struct vrange)
so if process is exited, "volatility" isn't valid any more.
It isn't a problem in anonymous but could be in file-vrange so let's
introduce fvrange for covering the problem.

fvrange(int fd, start_offset, length, mode, behavior)

First of all, let's see mvrange with anonymous and file page POV.

1) anon-mvrange

The page in volaitle range will be purged only if all of processes
marked the range as volatile.

If A process calls mvrange and is forked, vrange could be copied
from parent to child so not-yet-COWed pages could be purged
unless either one of both processes marks NO_VOLATILE explicitly.

Of course, COWed page could be purged easily because there is no link
any more.

Ack. This seems reasonable.


2) file-mvrange

A page in volatile range will be purged only if all of processes mapped
the page marked it as volatile AND there is no process mapped the page
as "private". IOW, all of the process mapped the page should map it
with "shared" for purging.

So, all of processes should mark each address range in own process
context if they want to collaborate with shared mapped file and gaurantee
there is no process mapped the range with "private".

Of course, volatility state will be terminated as the process is gone.

This case doesn't seem ideal to me, but is sort of how the current code works to avoid the complexity of dealing with memory volatile ranges that cross page types (file/anonymous). Although the current code just doesn't purge file pages marked with mvrange().

I'd much prefer file-mvrange calls to behave identically to fvrange calls.

The important point here is that the kernel doesn't *have* to purge anything ever. Its the kernel's discretion as to which volatile pages to purge when. So its easier for now to simply not purge file pages marked volatile via mvolatile.

There however is the inconsistency that file pages marked volatile via fvrange, then are marked non-volatile via mvrange() might still be purged. That is broken in my mind, and still needs to be addressed. The easiest out is probably just to return an error if any of the mvrange calls cover file pages. But I'd really like a better fix.


3) fvrange

It's same with 2) but volatility state could be persistent in address_space
until someone calls fvrange(NO_VOLATILE).
So it could remove the weakness of 2).
What do you think about above semantic?


I'd still like mvrange() calls on shared mapped files to be stored on the address_space.


If you don't have any problem, we could implement it. I think 1) and 2) could
be handled with my base code for anon-vrange handling with tweaking
file-vrange and need your new patches in address_space for handling 3).

I think we can get it sorted out. It might just take a few iterations.

thanks
-john



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/