Re: remove-stale-comment-from-swapfilec.patch added to -mm tree
From: Hugh Dickins
Date: Fri Aug 26 2005 - 14:22:07 EST
On Fri, 26 Aug 2005, Blaisorblade wrote:
> On Wednesday 24 August 2005 15:26, Hugh Dickins wrote:
>
> > If do_swap_page gets
> > a write fault, it either determines it can go ahead and use the swap
> > page, or if it can't, gets do_wp_page to Copy-On-Write for it (that's
> > a call I added in 2.6.7, as an optimization, and as a necessity for
> > correct behaviour of ptrace's get_user_pages; the latter has just in
> > 2.6.13-rc been made more resilient, so we could remove do_swap_page's
> > call to do_wp_page now - though I'm inclined to let it stay as an
> > optimization, avoiding the second fault which would follow).
> get_user_pages() can still get two faults there, because VM_FAULT_WRITE is not
> returned by do_swap_page(). And faults can be very expensive (for UML a fault
> is given by a SIGSEGV delivery).
You're right that it can get two "faults" there, but it's such a rare case
(ptrace modifying an area readonly to the process) that I didn't bother
about it. It isn't even two real faults, just two iterations within
get_user_pages - or does that somehow get worse in the UML case?
> > If do_swap_page gets a read fault, it doesn't COW at all.
>
> > I don't know what the "early" COW break referred to is: the write_access
> > call to do_wp_page could be deferred, yes, but it's hardly early.
> The idea in my mind is that after loading the page from swap the first time
> there's no need to copy the page to give a private copy to the process, if
> the page is kept on swap.
>
> We COW it anyway to break the sharing, to keep the original copy in the
> swapcache, instead of reading it again from the disk. This is *early*.
We always prefer not to read from the disk. You're right that we could
choose to remove the page from the swap cache at that point (locking
considerations?) and make it private (in the case where it has actually
been written to the disk, often not yet so), but that's not how the page
cache has ever been treated. Avoid going to slow disk at all costs.
Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/