Re: [tip:numa/core] sched/numa/mm: Improve migration

From: Ingo Molnar
Date: Sun Oct 21 2012 - 14:17:15 EST

Next message: Mark Hounschell: "Re: [Intel-gfx] drm_kms_helper problems"
Previous message: tip-bot for Daniel J Blueman: "[tip:x86/urgent] x86, AMD, MCE: Prevent oops on multi-server system"
In reply to: Peter Zijlstra: "Re: [tip:numa/core] sched/numa/mm: Improve migration"
Next in thread: Mel Gorman: "Re: [tip:numa/core] sched/numa/mm: Improve migration"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

* Johannes Weiner <hannes@xxxxxxxxxxx> wrote:

> On Thu, Oct 18, 2012 at 10:05:39AM -0700, tip-bot for Peter Zijlstra wrote:
> > Commit-ID: 713f937655c4b15131b5a0eae4610918a4febe17
> > Gitweb: http://git.kernel.org/tip/713f937655c4b15131b5a0eae4610918a4febe17
> > Author: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> > AuthorDate: Fri, 12 Oct 2012 19:30:14 +0200
> > Committer: Ingo Molnar <mingo@xxxxxxxxxx>
> > CommitDate: Mon, 15 Oct 2012 14:18:40 +0200
> >
> > sched/numa/mm: Improve migration
> >
> > Add THP migration. Extend task_numa_fault() to absorb THP faults.
> >
> > [ Would be nice if the gents on Cc: expressed their opinion about
> > this change. A missing detail might be cgroup page accounting,
> > plus the fact that some architectures might cache PMD_NONE pmds
> > in their TLBs, needing some extra TLB magic beyond what we already
> > do here? ]
>
> Looks good to me, the cgroup fixup should be easy enough as well
> (added the calls inline below).
>
> Of course I'm banging my head into a wall for not seeing earlier
> through the existing migration path how easy this could be. It would
> be great for compaction to have this fastpath in the traditional
> migration code too.
>
> Right now, unlike the traditional migration path, this breaks COW for
> every migration, but maybe you don't care about shared pages in the
> first place. And fixing that should be nothing more than grabbing the
> anon_vma lock and using rmap to switch more than one pmd over, right?
>
> It won't work for pages in swap, which is only a future problem.
>
> It's slightly ugly that migrate_page_copy() actually modifies the
> existing page (deactivation, munlock) when you end up having to revert
> back to it.
>
> The new page needs to be PageUptodate.
>
> > + task_numa_placement();
> > +
> > + new_page = alloc_pages_node(node,
> > + (GFP_TRANSHUGE | GFP_THISNODE) & ~(__GFP_NO_KSWAPD | __GFP_WAIT),
> > + HPAGE_PMD_ORDER);
> > +
> > + WARN_ON(PageLRU(new_page));

This WARN_ON() is somewhat problematic in OOM or OOLNM
situations, so I removed it ;-)

> > +
> > + if (!new_page)
> > + goto alloc_fail;
>
> mem_cgroup_prepare_migration(page, new_page, &memcg);
>
> > + lru = PageLRU(page);
> > +
> > + if (lru && isolate_lru_page(page)) /* does an implicit get_page() */
> > + goto alloc_fail;
> > +
> > + if (!trylock_page(new_page))
> > + BUG();
> > +
> > + /* anon mapping, we can simply copy page->mapping to the new page: */
> > + new_page->mapping = page->mapping;
> > + new_page->index = page->index;
> > +
> > + migrate_page_copy(new_page, page);
> > +
> > + WARN_ON(PageLRU(new_page));
> >
> > -do_fixup:
> > spin_lock(&mm->page_table_lock);
> > - if (unlikely(!pmd_same(*pmd, entry)))
> > - goto out_unlock;
> > -#endif
> > + if (unlikely(!pmd_same(*pmd, entry))) {
> > + spin_unlock(&mm->page_table_lock);
> > + if (lru)
> > + putback_lru_page(page);
> >
> > - /* change back to regular protection */
> > - entry = pmd_modify(entry, vma->vm_page_prot);
> > - if (pmdp_set_access_flags(vma, haddr, pmd, entry, 1))
> > - update_mmu_cache(vma, address, entry);
> > + unlock_page(new_page);
> > + ClearPageActive(new_page); /* Set by migrate_page_copy() */
> > + new_page->mapping = NULL;
> > + put_page(new_page); /* Free it */
> >
> > -out_unlock:
> > + unlock_page(page);
> > + put_page(page); /* Drop the local reference */
> > +
> > + return;
> > + }
> > +
> > + entry = mk_pmd(new_page, vma->vm_page_prot);
> > + entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
> > + entry = pmd_mkhuge(entry);
> > +
> > + page_add_new_anon_rmap(new_page, vma, haddr);
> > +
> > + set_pmd_at(mm, haddr, pmd, entry);
> > + update_mmu_cache(vma, address, entry);
> > + page_remove_rmap(page);
> > spin_unlock(&mm->page_table_lock);
> > - if (page)
> > +
> > + put_page(page); /* Drop the rmap reference */
> > +
> > + task_numa_fault(node, HPAGE_PMD_NR);
> > +
> > + if (lru)
> > + put_page(page); /* drop the LRU isolation reference */
> > +
> > + unlock_page(new_page);
>
> mem_cgroup_end_migration(memcg, page, new_page, true);
>
> > + unlock_page(page);
> > + put_page(page); /* Drop the local reference */
> > +
> > + return;
> > +
> > +alloc_fail:
> > + if (new_page)
> > + put_page(new_page);
> mem_cgroup_end_migration(memcg, page, new_page, false);
> }
>
> > + task_numa_fault(page_to_nid(page), HPAGE_PMD_NR);
> > + unlock_page(page);
> > +
> > + spin_lock(&mm->page_table_lock);
> > + if (unlikely(!pmd_same(*pmd, entry))) {
> > put_page(page);
> > + page = NULL;
> > + goto unlock;
> > + }
> > + goto fixup;
> > }

Cool!

Would any of the gents be interested in turning the suggestions
above into a suitable patch against this tree:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git numa/core

?

Thanks a ton!

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Mark Hounschell: "Re: [Intel-gfx] drm_kms_helper problems"
Previous message: tip-bot for Daniel J Blueman: "[tip:x86/urgent] x86, AMD, MCE: Prevent oops on multi-server system"
In reply to: Peter Zijlstra: "Re: [tip:numa/core] sched/numa/mm: Improve migration"
Next in thread: Mel Gorman: "Re: [tip:numa/core] sched/numa/mm: Improve migration"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]