Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly

From: Mel Gorman
Date: Thu Feb 14 2013 - 06:58:17 EST

Next message: Rafael J. Wysocki: "Re: [PATCH] PCI / ACPI: Report _OSC control mask returned on failure to get control"
Previous message: Mark Brown: "Re: [PATCH 2/2] regmap: irq: do not write mask register if it is notsupported"
In reply to: Hugh Dickins: "Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly"
Next in thread: Hugh Dickins: "Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Feb 08, 2013 at 11:33:40AM -0800, Hugh Dickins wrote:
> > > <SNIP>
> > >
> > > 2. __ksm_enter() has a nice little optimization, to insert the new mm
> > > just behind ksmd's cursor, so there's a full pass for it to stabilize
> > > (or be removed) before ksmd addresses it. Nice when ksmd is running,
> > > but not so nice when we're trying to unmerge all mms: we were missing
> > > those mms forked and inserted behind the unmerge cursor. Easily fixed
> > > by inserting at the end when KSM_RUN_UNMERGE.
> > >
> > > 3. It is possible for a KSM page to be faulted back from swapcache into
> > > an mm, just after unmerge_and_remove_all_rmap_items() scanned past it.
> > > Fix this by copying on fault when KSM_RUN_UNMERGE: but that is private
> > > to ksm.c, so dissolve the distinction between ksm_might_need_to_copy()
> > > and ksm_does_need_to_copy(), doing it all in the one call into ksm.c.
>
> What I found is that a 4th cause emerges once KSM migration
> is properly working: that interval during page migration when the old
> page has been fully unmapped but the new not yet mapped in its place.
>

For anyone else watching -- normal page migration expects to be protected
during that particular window with migration ptes. Any references to the
PTE mapping a page being migrated faults on a swap-like PTE and waits
in migration_entry_wait().

> The KSM COW breaking cannot see a page there then, so it ends up with
> a (newly migrated) KSM page left behind. Almost certainly has to be
> fixed in follow_page(), but I've not yet settled on its final form -
> the fix I have works well, but a different approach might be better.
>

follow_page() is one option. My guess is that you're thinking of adding
a FOLL_ flag that will cause follow_page() to check is_migration_entry()
and migration_entry_wait() if the flag is present.

Otherwise you would need to check for migration ptes in a number of places
under page lock and then hold the lock for long periods of time to prevent
migration starting. I did not check this option in depth because it quickly
looked like it would be a mess, with long page lock hold times and might
not even be workable.

> > > +static int remove_all_stable_nodes(void)
> > > +{
> > > + struct stable_node *stable_node;
> > > + int nid;
> > > + int err = 0;
> > > +
> > > + for (nid = 0; nid < nr_node_ids; nid++) {
> > > + while (root_stable_tree[nid].rb_node) {
> > > + stable_node = rb_entry(root_stable_tree[nid].rb_node,
> > > + struct stable_node, node);
> > > + if (remove_stable_node(stable_node)) {
> > > + err = -EBUSY;
> > > + break; /* proceed to next nid */
> > > + }
> >
> > If remove_stable_node() returns an error then it's quite possible that it'll
> > go boom when that page is encountered later but it's not guaranteed. It'd
> > be best effort to continue removing as many of the stable nodes anyway.
> > We're in trouble either way of course.
>
> If it returns an error, then indeed something we don't yet understand
> has occurred, and we shall want to debug it. But unless it's due to
> corruption somewhere, we shouldn't be in much trouble, shouldn't go boom:
> remove_all_stable_nodes() error is ignored at the end of unmerging, it
> will be tried again when changing merge_across_nodes, and an error
> then will just prevent changing merge_across_nodes at that time. So
> the mysteriously unremovable stable nodes remain the same kind of tree.
>

Ok.

--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Rafael J. Wysocki: "Re: [PATCH] PCI / ACPI: Report _OSC control mask returned on failure to get control"
Previous message: Mark Brown: "Re: [PATCH 2/2] regmap: irq: do not write mask register if it is notsupported"
In reply to: Hugh Dickins: "Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly"
Next in thread: Hugh Dickins: "Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]