Re: [PATCH 04/14] mm,migration: Allow the migration ofPageSwapCache pages

From: Andrea Arcangeli
Date: Fri Apr 23 2010 - 14:32:52 EST


Hi Mel,

On Thu, Apr 22, 2010 at 04:44:43PM +0100, Mel Gorman wrote:
> heh, I thought of a similar approach at the same time as you but missed
> this mail until later. However, with this approach I suspect there is a
> possibility that two walkers of the same anon_vma list could livelock if
> two locks on the list are held at the same time. Am still thinking of
> how it could be resolved without introducing new locking.

Trying to understand this issue and I've some questions. This
vma_adjust and lock inversion troubles with the anon-vma lock in
rmap_walk are a new issue introduced by the recent anon-vma changes,
right?

About swapcache, try_to_unmap just nuke the mappings, establish the
swap entry in the pte (not migration entry), and then there's no need
to call remove_migration_ptes. So it just need to skip it for
swapcache. page_mapped must return zero after try_to_unmap returns
before we're allowed to migrate (plus the page count must be just 1
and not 2 or more for gup users!).

I don't get what's the problem about swapcache and the races connected
to it, the moment I hear migration PTE in context of swapcache
migration I'm confused because there's no migration PTE for swapcache.

The new page will have no mappings either, it just needs to be part of
the swapcache with the same page->index = swapentry, indexed in the
radix tree with that page->index, and paga->mapping pointing to
swapcache. Then new page faults will bring it in the pageatables. The
lookup_swap_cache has to be serialized against some lock, it should be
the radix tree lock? So the migration has to happen with that lock
hold no? We can't just migrate swapcache without stopping swapcache
radix tree lookups no? I didn't digest the full migrate.c yet and I
don't see where it happens. Freeing the swapcache while simpler and
safer, would be quite bad as it'd create I/O for potentially hot-ram.

About the refcounting of anon-vma in migrate.c I think it'd be much
simpler if zap_page_range and folks would just wait (like they do if
they find a pmd_trans_huge && pmd_trans_splitting pmd), there would be
no need of refcounting the anon-vma that way.

I assume whatever is added to rmap_walk I also have to add to
split_huge_page later when switching to mainline anon-vma code (for
now I stick to 2.6.32 anon-vma code to avoid debugging anon-vma-chain,
memory compaction, swapcache migration and transparent hugepage at the
same time, which becomes a little beyond feasibility).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/