Re: mm: BUG in unmap_page_range

From: Sasha Levin
Date: Wed Sep 10 2014 - 15:06:47 EST


On 09/10/2014 08:47 AM, Mel Gorman wrote:
> migrate: debug patch to try identify race between migration completion and mprotect
>
> A migration entry is marked as write if pte_write was true at the
> time the entry was created. The VMA protections are not double checked
> when migration entries are being removed but mprotect itself will mark
> write-migration-entries as read to avoid problems. It means we potentially
> take a spurious fault to mark these ptes write again but otherwise it's
> harmless. Still, one dump indicates that this situation can actually
> happen so this debugging patch spits out a warning if the situation occurs
> and hopefully the resulting warning will contain a clue as to how exactly
> it happens
>
> Not-signed-off
> ---
> mm/migrate.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 09d489c..631725c 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -146,8 +146,16 @@ static int remove_migration_pte(struct page *new, struct vm_area_struct *vma,
> pte = pte_mkold(mk_pte(new, vma->vm_page_prot));
> if (pte_swp_soft_dirty(*ptep))
> pte = pte_mksoft_dirty(pte);
> - if (is_write_migration_entry(entry))
> - pte = pte_mkwrite(pte);
> + if (is_write_migration_entry(entry)) {
> + /*
> + * This WARN_ON_ONCE is temporary for the purposes of seeing if
> + * it's a case encountered by trinity in Sasha's testing
> + */
> + if (!(vma->vm_flags & (VM_WRITE)))
> + WARN_ON_ONCE(1);
> + else
> + pte = pte_mkwrite(pte);
> + }
> #ifdef CONFIG_HUGETLB_PAGE
> if (PageHuge(new)) {
> pte = pte_mkhuge(pte);

I seem to have hit this warning:

[ 4782.617806] WARNING: CPU: 10 PID: 21180 at mm/migrate.c:155 remove_migration_pte+0x3f7/0x420()
[ 4782.619315] Modules linked in:
[ 4782.622189]
[ 4782.622501] CPU: 10 PID: 21180 Comm: trinity-main Tainted: G W 3.17.0-rc4-next-20140910-sasha-00032-g6825fb5-dirty #1137
[ 4782.624344] 0000000000000009 ffff8800193eb770 ffffffffa04c742a 0000000000000000
[ 4782.627801] ffff8800193eb7a8 ffffffff9d16e55d 00007f2458d89000 ffff880120959600
[ 4782.629283] ffff88012b02c000 ffffea002abeab00 ffff88063118da90 ffff8800193eb7b8
[ 4782.631353] Call Trace:
[ 4782.633789] [<ffffffffa04c742a>] dump_stack+0x4e/0x7a
[ 4782.634314] [<ffffffff9d16e55d>] warn_slowpath_common+0x7d/0xa0
[ 4782.634877] [<ffffffff9d16e63a>] warn_slowpath_null+0x1a/0x20
[ 4782.635430] [<ffffffff9d315487>] remove_migration_pte+0x3f7/0x420
[ 4782.636042] [<ffffffff9d2e99cf>] rmap_walk+0xef/0x380
[ 4782.636544] [<ffffffff9d3147f1>] remove_migration_ptes+0x41/0x50
[ 4782.637130] [<ffffffff9d315090>] ? __migration_entry_wait.isra.24+0x160/0x160
[ 4782.639928] [<ffffffff9d3154b0>] ? remove_migration_pte+0x420/0x420
[ 4782.640616] [<ffffffff9d31671b>] move_to_new_page+0x16b/0x230
[ 4782.641251] [<ffffffff9d2e9e8c>] ? try_to_unmap+0x6c/0xf0
[ 4782.643950] [<ffffffff9d2e88a0>] ? try_to_unmap_nonlinear+0x5c0/0x5c0
[ 4782.644690] [<ffffffff9d2e70a0>] ? invalid_migration_vma+0x30/0x30
[ 4782.645273] [<ffffffff9d2e82e0>] ? page_remove_rmap+0x320/0x320
[ 4782.646072] [<ffffffff9d31717c>] migrate_pages+0x85c/0x930
[ 4782.646701] [<ffffffff9d2d0e20>] ? isolate_freepages_block+0x410/0x410
[ 4782.647407] [<ffffffff9d2cfa60>] ? arch_local_save_flags+0x30/0x30
[ 4782.648114] [<ffffffff9d2d1803>] compact_zone+0x4d3/0x8a0
[ 4782.650157] [<ffffffff9d2d1c2f>] compact_zone_order+0x5f/0xa0
[ 4782.651014] [<ffffffff9d2d1f87>] try_to_compact_pages+0x127/0x2f0
[ 4782.651656] [<ffffffff9d2b0c98>] __alloc_pages_direct_compact+0x68/0x200
[ 4782.652313] [<ffffffff9d2b17ca>] __alloc_pages_nodemask+0x99a/0xd90
[ 4782.652916] [<ffffffff9d300a1c>] alloc_pages_vma+0x13c/0x270
[ 4782.653618] [<ffffffff9d31d914>] ? do_huge_pmd_wp_page+0x494/0xc90
[ 4782.654487] [<ffffffff9d31d914>] do_huge_pmd_wp_page+0x494/0xc90
[ 4782.656045] [<ffffffff9d320d20>] ? __mem_cgroup_count_vm_event+0xd0/0x240
[ 4782.657089] [<ffffffff9d2dcb7d>] handle_mm_fault+0x8bd/0xc50
[ 4782.660931] [<ffffffff9d1d26e6>] ? __lock_is_held+0x56/0x80
[ 4782.662695] [<ffffffff9d0c7bc7>] __do_page_fault+0x1b7/0x660
[ 4782.663259] [<ffffffff9d1cdc5e>] ? put_lock_stats.isra.13+0xe/0x30
[ 4782.663851] [<ffffffff9d1abf41>] ? vtime_account_user+0x91/0xa0
[ 4782.664419] [<ffffffff9d2a2c35>] ? context_tracking_user_exit+0xb5/0x1b0
[ 4782.665119] [<ffffffff9db6e103>] ? __this_cpu_preempt_check+0x13/0x20
[ 4782.665969] [<ffffffff9d1ce2e2>] ? trace_hardirqs_off_caller+0xe2/0x1b0
[ 4782.666634] [<ffffffff9d0c8141>] trace_do_page_fault+0x51/0x2b0
[ 4782.667257] [<ffffffff9d0bee83>] do_async_page_fault+0x63/0xd0
[ 4782.667871] [<ffffffffa0511018>] async_page_fault+0x28/0x30

Although it wasn't followed by anything else, and I've seen the original issue
getting triggered without this WARN showing up, so it seems like a different,
unrelated issue?


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/