Re: what happened to page_mkwrite? - was: Re: page_mkwrite seemsbroken

From: Anton Altaparmakov
Date: Mon Oct 24 2005 - 16:19:17 EST


On Mon, 24 Oct 2005, Hugh Dickins wrote:
> On Mon, 24 Oct 2005, Anton Altaparmakov wrote:
> > On Mon, 24 Oct 2005, Hugh Dickins wrote:
> >
> > Now you got me completely confused. Just when I thought I was
> > understanding things. (-; Let me repeat what you say with some questions
> > thrown in... Please bear with me and help me beat some clue into my
> > head... (-:
>
> Sorry for confusing you. I can't answer many of your questions, because
> I don't know what you're doing or intending to do. But you expressed an
> aversion to allocating pages unnecessarily. Probably that made me think
> of memory allocation where you meant disk allocation.
>
> Cutting a lot of questions...
>
> > If your answer above is that the pages are normal page cache pages, then:
>
> Nothing special needs doing if you choose to use normal page cache pages
> even for the holes.

Great! I have no intention of using ZERO_PAGE(). Just normal page cache
pages that are memset() to zero when sparse. Phew. /me relaxes (((-:

> Sorry for the confusion: I was just trying to warn you of some difficulties
> and their solution, if you were intending to pursue an alternative path.

No need to apologize!

If I had wanted to use the ZERO_PAGE() then you would be right and I would
have missed all those things you said, but I never even knew about
ZERO_PAGE(). (-: I could well see that as a nice optimization at some
point but for now I want it to work, not conserve memory. (-:

Thank you very much for your comments!

In case you are curious, ntfs allows logical blocks of between 512 bytes
and many hundreds of kiB in size (but always power of 2). So to write to
a mmap()ed sparse file using a PAGE_CACHE_SIZE page into the middle of a
large, sparse logical block, I need to allocate the whole block on disk
and cause all page cache pages to be zeroed and marked dirty. To do this
from writepage() is not possible due to deadlocks. 1) because the page is
locked already and I would need to lock all the other pages in that
logical block so we get into deadlock city with out of order page locking
(I now only lock in ascending page index order and this requires no page
with a higher index to be locked and dropping page lock in writepage is
royal pain in the backside) and 2) because I am not meant to go allocating
memory for more pages when the system is low on memory and running
writepage exactly so it can reclaim some memory...

How I want to use page_mkdirty is that when it is run for a sparse logical
block of size > PAGE_CACHE_SIZE, I allocate the logical block and get hold
of all the pages (locked) that lie in that block and bring them uptodate
(by zeroing if not uptodate already) and then mark them all dirty and
release them again so the zeroes will make it to disk later. Not sure
whether to do the allocations even for logical blocks <= PAGE_CACHE_SIZE
or just leave those to writepage...

In fact before allocating the block I plan to simply do a page cache read
(via read_cache_page() which will give me uptodate, cleared pages) then do
the allocation, then mark the pages dirty and that's it. Writepage will
later cause the buffers in the pages to be mapped to the new on-disk
location and will write the dirty pages to disk. (I may map the buffers
in the pages there an then as an optimization given I have the pages and I
know the on-disk location but I am not sure I will do that, at least
probably not initially as it only makes the code more complex for very
little gain.)

There is another ntfs complication and this is initialized size. This is
an evil beast that say that anything between initialized size and the real
file size (inode->i_size), no matter whether it is allocated on disk or is
a sparse hole or a mixture of the two, is to be read as zeroes. The
annoying thing here is that if you have a 1TiB file that is fully
allocated on disk but has an initialized size of 0, and you write 1 byte
somewhere towards the end of the file (or even at the end), you need to
write to disk zeroes between file offset = initialized size (0 in this
example) and the position of the write, in this case 1TiB. Doing that
from writepage could never fly. But from page_mkdirty() it should work,
again in same way as above for the sparse holes, I will read_cache_page()
followed by set_page_dirty() for all pages between initialized size and
the offset of the write. It just means that the first write to such an
mmaped file would take a _very_ long time in the specific example above...

Note for the above I plan to leverage
fs/ntfs/file.c::ntfs_attr_extend_initialized() or at least an adapted form
of it. This function does the above described but for the file write(2)
case where a user opens a file and writes somewhere beyond the initialized
size...

I hope that explains what and why I am doing and I also hope that if you
were not interested you didn't bother reading it all and hence never see
this sentence. (-;

Best regards,

Anton
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/