Re: [PATCH 1/6] mm: tracking shared dirty pages

From: Peter Zijlstra
Date: Thu Jun 22 2006 - 19:01:44 EST


On Thu, 2006-06-22 at 21:52 +0100, Hugh Dickins wrote:
> On Mon, 19 Jun 2006, Peter Zijlstra wrote:

> > +static inline int is_shared_writable(unsigned int flags)
> > +{
> > + return (flags & (VM_SHARED|VM_WRITE|VM_PFNMAP)) ==
> > + (VM_SHARED|VM_WRITE);
> > +}
> > +
>
> Andrew asked for the inclusion of VM_PFNMAP to be commented there,
> I don't believe that's enough: a function called "is_shared_writable"
> should be testing precisely that, or people will misuse it.
>
> Either you change the name to "is_shared_writable_but_not_pfnmap"
> or somesuch, or you split out the VM_PFNMAP test, or you do away
> with the function and make the tests explicit inline. As before,
> my instinctive preference is the latter: I really want to see what's
> being tested (especially in do_wp_page); but perhaps it'll just look
> too ugly all over - give it a try and see.

*sight*, thats it, explicit it will be :-)

> > + /*
> > + * This is not fully correct in the light of trapping write faults
> > + * for writable shared mappings. However since we're going to mark
> > + * the page dirty anyway some few lines downward, we might as well
> > + * take the write fault now.
> > + */
>
> I don't understand what you're getting at here: please explain,
> what is not fully correct and why? In mail first, then we can
> decide what the comment should say, or if it should be removed.
> follow_page isn't making a pte writable, so what's the issue?

I have no idea either, I reread this part earlier today and found it one
big brainfart. It does indeed seem to do the right thing.

> > - if (unlikely(vma->vm_flags & VM_SHARED)) {
> > + if (unlikely(is_shared_writable(vma->vm_flags))) {
>
> Most interesting line in the series, yes, and I'd find it
> easier to think through if it showed the flags test explicitly:
> if ((vma->vm_flags & (VM_SHARED|VM_WRITE|VM_PFNMAP)) ==
> (VM_SHARED|VM_WRITE))
>
> Yes, Andrew, you're right it's a change in behaviour from David's
> page_mkwrite patch. I've realized that when I was originally
> reviewing David's patch, I believed do_wp_page was mistaken to be
> doing COW on VM_SHARED areas. But Linus has since asserted very
> forcefully that it's intentional, that ptrace poke on a VM_SHARED
> area which is currently not !VM_WRITE should COW it, so I mentioned
> that to Peter.
>
> Has he got the test right there now? Ummm... maybe: my brain
> exploded weeks ago. Several strangenesses collide here, I'll
> try again tomorrow, maybe others will argue it to certainty before.

I don't think the VM_PFNMAP is needed here, but it doesn't hurt either.
Like said, I'll do explicits from now on.

> > @@ -1084,18 +1086,13 @@ munmap_back:
> > error = file->f_op->mmap(file, vma);
> > if (error)
> > goto unmap_and_free_vma;
> > +
>
> Do you really need this blank line?

:-) uhu..

> > + /*
> > + * Tracking of dirty pages for shared writable mappings. Do this by
> > + * write protecting writable pages, and mark dirty in the write fault.
> > + *
> > + * Modify vma->vm_page_prot (the default protection for new pages)
> > + * to this effect.
> > + *
> > + * Cannot do before because the condition depends on:
> > + * - backing_dev_info having the right capabilities
> > + * (set by f_op->open())
>
> Is that so, backing_dev_info set by f_op->open()?
> And how would that be a problem here if it were so?

useless information indeed, a remnant from old times when I placed the
vm_page_prot modification between the two calls, shall remove.

> > + * - vma->vm_flags being fully set
> > + * (finished in f_op->mmap(), which could call remap_pfn_range())
> > + *
> > + * Also, cannot reset vma->vm_page_prot from vma->vm_flags because
> > + * f_op->mmap() can modify it.
> > + */
> > + if (is_shared_writable(vm_flags) && vma->vm_file)
> > + mapping = vma->vm_file->f_mapping;
> > + if ((mapping && mapping_cap_account_dirty(mapping)) ||
> > + (vma->vm_ops && vma->vm_ops->page_mkwrite))
>
> The only way "mapping" might be set is just above.
> Wouldn't it all be clearer (though more indented) if you said
>
> if (is_shared_writable(vm_flags) && vma->vm_file) {
> mapping = vma->vm_file->f_mapping;
> if ((mapping && mapping_cap_account_dirty(mapping)) ||
> (vma->vm_ops && vma->vm_ops->page_mkwrite)) {
> vma->vm_page_prot = whatever;
> }
> }
>
> Or no need for "mapping" here at all if you change
> mapping_cap_account_dirty(vma->vm_file->f_mapping)
> to do the right thing with NULL.

Made it one big if stmt, perhaps too big, we'll see.

>
> > + vma->vm_page_prot =
> > + __pgprot(pte_val
> > + (pte_wrprotect
> > + (__pte(pgprot_val(vma->vm_page_prot)))));
> > +
>
> In other mail I've suggested saving vm_page_prot above, and
> changing it here only if the driver's ->mmap did not change it.

Yes, that was a very good suggestion and has already been incorporated,
thanks.

> I remain uneasy about interfering with the permissions expected by
> strange drivers, but can't really justify my paranoia. Certainly
> you're right to exclude VM_PFNMAPs from this interference, that's
> important; I'd be less uneasy if you also exclude VM_INSERTPAGEs,
> they're strange too - but at least they're dealing with proper struct
> pages, so should be able to handle an unexpected do_wp_page; that
> leaves the driver nopage cases, which again should be okay now you're
> (one way or another) protecting specially added vm_page_prot flags.

VM_INSERTPAGE thou shall have.

> I guess I'm just paranoid; it's irritating me that we do not have
> the right backing_dev_infos in place and having to hack around it.

Sad situation but true.

> > +static int page_mkclean_file(struct address_space *mapping, struct page *page)
> > +{
> > + pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
> > + struct vm_area_struct *vma;
> > + struct prio_tree_iter iter;
> > + int ret = 0;
> > +
> > + BUG_ON(PageAnon(page));
> > +
> > + spin_lock(&mapping->i_mmap_lock);
> > + vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
> > + int protect = mapping_cap_account_dirty(mapping) &&
> > + is_shared_writable(vma->vm_flags);
> > + ret += page_mkclean_one(page, vma, protect);
>
> You have a good point here, one I'd completely missed: because a vma
> may have been recently mprotected !VM_WRITE, you have to check readonly
> mappings too. Perhaps worth a comment. But I think "is_shared_writable"
> is not the best test here: just test for VM_SHARED vmas, they're the
> only ones which can be mprotected to/from shared writable. And then
> I think you don't need to pass down an additional "protect" argument?
> It's only being called for mapping_cap_account_dirty mappings anyway,
> isn't it?

Well, no, not anymore. I thought to make it actually do what its name
said it does: clean the page's PTEs (I am even pondering about
implementing the anonymous branch).

In that light, its now called for each page.


New patch will follow shortly since I can't seem to sleep anyway...



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/