Re: [PATCH 02 of 66] mm, migration: Fix race betweenshift_arg_pages and rmap_walk by guaranteeing rmap_walk finds PTEs createdwithin the temporary stack

From: Andrea Arcangeli
Date: Tue Nov 09 2010 - 16:37:47 EST


On Tue, Nov 09, 2010 at 12:01:55PM +0900, KOSAKI Motohiro wrote:
> Hi
>
> > From: Andrea Arcangeli <aarcange@xxxxxxxxxx>
> >
> > Page migration requires rmap to be able to find all migration ptes
> > created by migration. If the second rmap_walk clearing migration PTEs
> > misses an entry, it is left dangling causing a BUG_ON to trigger during
> > fault. For example;
> >
> > [ 511.201534] kernel BUG at include/linux/swapops.h:105!
> > [ 511.201534] invalid opcode: 0000 [#1] PREEMPT SMP
> > [ 511.201534] last sysfs file: /sys/block/sde/size
> > [ 511.201534] CPU 0
> > [ 511.201534] Modules linked in: kvm_amd kvm dm_crypt loop i2c_piix4 serio_raw tpm_tis shpchp evdev tpm i2c_core pci_hotplug tpm_bios wmi processor button ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot dm_mod raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid1 raid0 multipath linear md_mod sg sr_mod cdrom sd_mod ata_generic ahci libahci libata ide_pci_generic ehci_hcd ide_core r8169 mii ohci_hcd scsi_mod floppy thermal fan thermal_sys
> > [ 511.888526]
> > [ 511.888526] Pid: 20431, comm: date Not tainted 2.6.34-rc4-mm1-fix-swapops #6 GA-MA790GP-UD4H/GA-MA790GP-UD4H
> > [ 511.888526] RIP: 0010:[<ffffffff811094ff>] [<ffffffff811094ff>] migration_entry_wait+0xc1/0x129
>
> Do you mean current linus-tree code is broken? do we need to merge this
> ASAP and need to backport stable tree?

No current upstream is not broken here, but it's required with THP
unless I add the below block to huge_memory.c rmap walks too. I am
afraid to forget special cases like the below in the long term, so the
fewer special/magic cases there are in the kernel to remember forever
the better for me, as it's less likely I introduce hard-to-reproduce
bugs later on in the kernel when I add more rmap walks.

-
- /*
- * During exec, a temporary VMA is setup and later
- moved.
- * The VMA is moved under the anon_vma lock but not
- the
- * page tables leading to a race where migration
- cannot
- * find the migration ptes. Rather than increasing the
- * locking requirements of exec(), migration skips
- * temporary VMAs until after exec() completes.
- */
- if (PAGE_MIGRATION && (flags & TTU_MIGRATION) &&
- is_vma_temporary_stack(vma))
- continue;
-

Now if Linus tells me to drop this patch, I will add the above block
to huge_memory.c, and then I can drop this 2/66 patch immediately
(hopefully the user stack will never be born as an hugepage or the
randomizing mremap during execve will be executed with perfect 2m
aligned offsets or then it wouldn't work anymore, but for now I can
safely remove this patch and add the above block to huge_memory.c).

I see the ugliness in exec.c, but I think we're less likely to have
bugs in the kernel in the future without the above magic in every
remap walk that needs to be accurate (like huge_memory.c and
migrate.c), the rmap walks are complicated enough already for my
taste. But I'm totally fine either ways, as I think I can remember.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/