Re: [PATCH] mm: fix special swap entry handling on copy mm

From: Jerome Glisse
Date: Mon Aug 12 2013 - 17:50:40 EST


On Mon, Aug 12, 2013 at 05:36:40PM -0400, Naoya Horiguchi wrote:
> Hi Jerome,
>
> On Mon, Aug 12, 2013 at 11:43:24AM -0400, j.glisse@xxxxxxxxx wrote:
> > From: Jerome Glisse <jglisse@xxxxxxxxxx>
> >
> > Prior to this copy_one_pte will never reach the special swap file
> > handling code because swap_duplicate will return invalid value.
> >
> > Note this is not fatal so nothing bad ever happen because of that.
> > Reason is that copy_pte_range would break of its loop and call
> > add_swap_count_continuation which would see its a special swap
> > file and return 0 triggering copy_pte_range to try again. Because
> > we try again there is a huge chance that the temporarily special
> > migration pte is now again valid and pointing to a new valid page.
> >
> > This patch just split handling of special swap entry from regular
> > one inside copy_one_pte.
> >
> > (Note i spotted that while reading code i haven't tested my theory.)
> >
> > Signed-off-by: Jerome Glisse <jglisse@xxxxxxxxxx>
>
> non_swap_entry() means not only migration entry, but also hwpoison entry,
> so it seems to me that simply moving the swap_duplicate() into the
> if(!non_swap_entry) block can change the behavior for hwpoison entry.
> Would it be nice to add check for such a case?
>
> Thanks,
> Naoya Horiguchi

Well if my reading of the code is right for hwpoison entry current code will
loop indefinitly inside the kernel on fork if one entry is set to hwpoison.

My patch does not handle hwpoison because it seems useless as there is nothing
to do for hwpoison pte beside giving setting the new pte to hwpoison to. So
the fork child will also have a pte with hwpoison. My patch do just that.

So change in behavior is current kernel loop indefinitly in kernel with hwpoison
pte on fork, vs child get hwpoison pte with my patch. Meaning that both child
and father can live as long as they dont access the hwpoisoned ptes.

Cheers,
Jerome

>
> > ---
> > mm/memory.c | 26 +++++++++++++-------------
> > 1 file changed, 13 insertions(+), 13 deletions(-)
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 1ce2e2a..9f907dd 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -833,20 +833,20 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> > if (!pte_file(pte)) {
> > swp_entry_t entry = pte_to_swp_entry(pte);
> >
> > - if (swap_duplicate(entry) < 0)
> > - return entry.val;
> > -
> > - /* make sure dst_mm is on swapoff's mmlist. */
> > - if (unlikely(list_empty(&dst_mm->mmlist))) {
> > - spin_lock(&mmlist_lock);
> > - if (list_empty(&dst_mm->mmlist))
> > - list_add(&dst_mm->mmlist,
> > - &src_mm->mmlist);
> > - spin_unlock(&mmlist_lock);
> > - }
> > - if (likely(!non_swap_entry(entry)))
> > + if (likely(!non_swap_entry(entry))) {
> > + if (swap_duplicate(entry) < 0)
> > + return entry.val;
> > +
> > + /* make sure dst_mm is on swapoff's mmlist. */
> > + if (unlikely(list_empty(&dst_mm->mmlist))) {
> > + spin_lock(&mmlist_lock);
> > + if (list_empty(&dst_mm->mmlist))
> > + list_add(&dst_mm->mmlist,
> > + &src_mm->mmlist);
> > + spin_unlock(&mmlist_lock);
> > + }
> > rss[MM_SWAPENTS]++;
> > - else if (is_migration_entry(entry)) {
> > + } else if (is_migration_entry(entry)) {
> > page = migration_entry_to_page(entry);
> >
> > if (PageAnon(page))
> > --
> > 1.8.3.1
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@xxxxxxxxxx For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/