Re: RFC: [PATCH-2.6] Add helper function to lock multiple page cache pages- nopage alternative

From: Bryan Henderson
Date: Thu Feb 03 2005 - 14:31:01 EST


>> > > And for the vmscan->writepage() side of things I wonder if it would
be
>> > > possible to overload the mapping's ->nopage handler. If the target
page
>> > > lies in a hole, go off and allocate all the necessary pagecache
pages, zero
>> > > them, mark them dirty?
>> >
>> > I guess it would be possible but ->nopage is used for the read case
and
>> > why would we want to then cause writes/allocations?
>>
>> yup, we'd need to create a new handler for writes, or pass
`write_access'
>> into ->nopage. I think others (dwdm2?) have seen a need for that.
>
>That would work as long as all writable mappings are actually written to
>everywhere. Otherwise you still get that reading the whole mmap()ped
>are but writing a small part of it would still instantiate all of it on
>disk. As far as I understand this there is no way to hook into the mmap
>system such that we have a hook whenever a mmap()ped page gets written
>to for the first time. (I may well be wrong on that one so please
>correct me if that is the case.)

I think the point is that we can't have a "handler for writes," because
the writes are being done by simple CPU Store instructions in a user
program. The handler we're talking about is just for page faults. Other
operating systems approach this by actually _having_ a handler for a CPU
store instruction, in the form of a page protection fault handler -- the
nopage routine adds the page to the user's address space, but write
protects it. The first time the user tries to store into it, the
filesystem driver gets a chance to do what's necessary to support a dirty
cache page -- allocate a block, add additional dirty pages to the cache,
etc. It would be wonderful to have that in Linux. I saw hints of such
code in a Linux kernel once (a "write_protect" address space operation or
something like that); I don't know what happened to it.

Short of that, I don't see any way to avoid sometimes filling in holes due
to reads. It's not a huge problem, though -- it requires someone to do a
shared writable mmap and then read lots of holes and not write to them,
which is a pretty rare situation for a normal file.

I didn't follow how the helper function solves this problem. If it's
something involving adding the required extra pages to the cache at
pageout time, then that's not going to work -- you can't make adding pages
to the cache a prerequisite for cleaning a page -- that would be Deadlock
City.

My large-block filesystem driver does the nopage thing, and does in fact
fill in files unnecessarily in this scenario. :-( The driver for the
same filesystems on AIX does not, though. It has the write protection
thing.

--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/