Re: [PATCH 0/2 x86] fix some page faults in nmi if kmemcheck isenabled

From: Li Zhong
Date: Mon Feb 20 2012 - 20:42:48 EST


On Mon, 2012-02-20 at 12:00 +0100, Peter Zijlstra wrote:
> On Mon, 2012-02-20 at 14:01 +0800, Li Zhong wrote:
> > If CONFIG_KMEMCHECK is enabled, there might be page faults in nmi if the
> > pages are marked as not present by kmemcheck, like following:
> >
> > [ 4.535803] WARNING: at arch/x86/mm/kmemcheck/kmemcheck.c:634 kmemcheck_fault+0xb9/0xd0()
> > [ 4.633429] Hardware name: System x3650 M3 -[7945AC1]-
> > [ 4.694710] Modules linked in:
> > [ 4.731105] Pid: 1, comm: swapper/0 Not tainted 3.3.0-rc3 #15
> > [ 4.799654] Call Trace:
> > [ 4.828751] <NMI> [<ffffffff81042eca>] warn_slowpath_common+0x7a/0xb0
> > [ 4.907713] [<ffffffff81042f15>] warn_slowpath_null+0x15/0x20
> > [ 4.977301] [<ffffffff8103ce89>] kmemcheck_fault+0xb9/0xd0
> > [ 5.043778] [<ffffffff81551ba6>] do_page_fault+0x406/0x550
> > [ 5.110252] [<ffffffff8154e235>] page_fault+0x25/0x30
> > [ 5.171535] [<ffffffff8154f005>] ? nmi_handle.clone.1+0x75/0xc0
> > [ 5.243202] [<ffffffff8154efcf>] ? nmi_handle.clone.1+0x3f/0xc0
> > [ 5.314867] [<ffffffff8154ef90>] ? __die+0xf0/0xf0
> > [ 5.373038] [<ffffffff8154f15f>] do_nmi+0x10f/0x360
> > [ 5.432243] [<ffffffff8154e5cd>] restart_nmi+0x1a/0x1e
> > [ 5.494565] [<ffffffff8154e210>] ? general_protection+0x30/0x30
> > [ 5.566234] [<ffffffff8154e210>] ? general_protection+0x30/0x30
> > [ 5.637898] [<ffffffff8154e210>] ? general_protection+0x30/0x30
> > [ 5.709566] <<EOE>> [<ffffffff8126d814>] ? rb_insert_color+0xa4/0x150
> > [ 5.788526] [<ffffffff8119d17b>] sysfs_link_sibling+0x8b/0x110
> > [ 5.859155] [<ffffffff8119dff1>] __sysfs_add_one+0xc1/0x100
> > [ 5.926666] [<ffffffff8119e056>] sysfs_add_one+0x26/0xd0
> > [ 5.991065] [<ffffffff8119cdf4>] sysfs_add_file_mode+0xc4/0x100
> > [ 6.062731] [<ffffffff8119fc41>] internal_create_group+0xc1/0x1a0
> > [ 6.136473] [<ffffffff8119fd4e>] sysfs_create_group+0xe/0x10
> > [ 6.205026] [<ffffffff81351c1a>] dpm_sysfs_add+0x2a/0xd0
> > [ 6.269425] [<ffffffff81349bf5>] device_add+0x5e5/0x730
> > [ 6.332783] [<ffffffff81349d59>] device_register+0x19/0x20
> > [ 6.399260] [<ffffffff8135b6b8>] add_memory_section+0x158/0x1e0
> > [ 6.470927] [<ffffffff81ca757e>] memory_dev_init+0x75/0x108
> > [ 6.538439] [<ffffffff81ca73a9>] driver_init+0x31/0x33
> > [ 6.600762] [<ffffffff81c72c68>] kernel_init+0xcc/0x169
> > [ 6.664121] [<ffffffff81555e64>] kernel_thread_helper+0x4/0x10
> > [ 6.734749] [<ffffffff81c72b9c>] ? start_kernel+0x3ab/0x3ab
> > [ 6.802261] [<ffffffff81555e60>] ? gs_change+0x13/0x13
> > [ 6.864585] ---[ end trace a7919e7f17c0a725 ]---
> >
> > These two patches tries to fix some of the problems by avoiding using the
> > non-present pages.
>
>
> Hell no, these are some of the ugliest patches I've seen in a while. Not
> to mention that their changelogs are utter crap since they don't even
> explain why they're doing what they're doing.
>
Hi Peter,

I agree that the fix is ugly. I'm willing to change if there are some
better ways.

The problem here is:
1. It seems x86 doesn't allow page faults in nmi, and there are checks
in the code, like WARN_ON_ONCE(in_nmi()).

2. If CONFIG_KMEMCHECK is enabled, the pages allocated through slab will
be marked as non-present, to capture uninitialized memory access. More
information in Documentation/kmemcheck.txt .

3. From the log, there are some memories accessed in nmi, which are in
pages marked as non-present by kmemcheck, as they are allocated by
something like kmalloc().

Thanks,
Zhong

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/