Re: pipe/page fault oddness.

From: Sasha Levin
Date: Thu Oct 02 2014 - 11:04:56 EST


On 10/01/2014 06:42 PM, Linus Torvalds wrote:
> On Wed, Oct 1, 2014 at 3:08 PM, Sasha Levin <sasha.levin@xxxxxxxxxx> wrote:
>> >
>> > I've tried this patch on the same configuration that was triggering
>> > the VM_BUG_ON that Hugh mentioned previously. Surprisingly enough it
>> > ran fine for ~20 minutes before exploding with:
> Well, that's somewhat encouraging. I didn't expect it to be perfect.
>
> That said, "ran fine" isn't necessarily the same thing as "worked".
> Who knows how buggy it was without showing overt symptoms until the
> BUG_ON() triggered. But hey, I'll be optimistic.
>
>> > [ 2781.566206] kernel BUG at mm/huge_memory.c:1293!
> So that's
>
> BUG_ON(is_huge_zero_page(page));
>
> and the reason is trivial: the old code used to have a magical special
> case for the zero-page hugepage (see change_huge_pmd()) and I got rid
> of that (because now it's just about setting protections, and the
> zero-page hugepage is in no way special.
>
> So I think the solution is equally trivial: just accept that the
> zero-page can happen, and ignore it (just un-numa it).
>
> Appended is a incremental diff on top of the previous one. Even less
> tested than the last case, but I think you get the idea if it doesn't
> work as-is.

I have a new one for you. I know it doesn't say "numa" anywhere, but I
haven't ever seen that trace before so I'll just go ahead and blame it
on your patch...

[ 2838.403382] BUG: unable to handle kernel paging request at 000000055d996e80
[ 2838.405740] IP: task_curr (kernel/sched/core.c:1010)
[ 2838.407076] PGD dba2c6067 PUD 0
[ 2838.407926] Thread overran stack, or stack corrupted
[ 2838.409093] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 2838.411454] Dumping ftrace buffer:
[ 2838.412602] (ftrace buffer empty)
[ 2838.413187] Modules linked in:
[ 2838.413187] CPU: 38 PID: 9342 Comm: trinity-c38 Not tainted 3.17.0-rc7-sasha-00041-g6c9c81b #1260
[ 2838.413187] task: ffff880dba2f0000 ti: ffff880dba2ec000 task.ti: ffff880dba2ec000
[ 2838.413187] RIP: task_curr (kernel/sched/core.c:1010)
[ 2838.413187] RSP: 0018:ffff880dba2ebf48 EFLAGS: 00010046
[ 2838.413187] RAX: 000000000000f080 RBX: ffff880dba2f0000 RCX: 000000000000000a
[ 2838.413187] RDX: 00000000ba1a9560 RSI: ffff880dba2f0000 RDI: ffff880dba2f0000
[ 2838.413187] RBP: ffff880dba2ebf98 R08: 000000000004862a R09: 0000000000000000
[ 2838.413187] R10: 0000000000000038 R11: 000000000000001f R12: ffff880dba2f0000
[ 2838.413187] R13: ffff880dd5420740 R14: 000000000000000b R15: ffffffff8cc92000
[ 2838.413187] FS: 00007f05f3dbc700(0000) GS:ffff880701e00000(0000) knlGS:0000000000000000
[ 2838.413187] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2838.413187] CR2: 000000055d996e80 CR3: 0000000dba2c5000 CR4: 00000000000006a0
[ 2838.413187] DR0: 00000000006ee000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2838.413187] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000090602
[ 2838.413187] Stack:
[ 2838.413187] ffffffff8816218b 0000000000000000 ffff880d0000000a 000000000000000b
[ 2838.413187] 0000000000000082 ffff880dba2f0000 000000000000000b ffff880dba2ec070
[ 2838.413187] 0000000000000000 ffffffff8cc92000 ffff880dba2ebff8 ffffffff88162a84
[ 2838.413187] Call Trace:
[ 2838.413187] <UNK>
[ 2838.413187] Code: 87 60 09 00 00 01 e8 8d ee ff ff 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 48 8b 57 08 55 48 c7 c0 80 f0 00 00 48 89 e5 5d 8b 52 18 <48> 8b 14 d5 80 c3 c4 8c 48 39 bc 10 68 09 00 00 0f 94 c0 0f b6
All code
========
0: 87 60 09 xchg %esp,0x9(%rax)
3: 00 00 add %al,(%rax)
5: 01 e8 add %ebp,%eax
7: 8d (bad)
8: ee out %al,(%dx)
9: ff (bad)
a: ff 5d c3 lcallq *-0x3d(%rbp)
d: 66 66 2e 0f 1f 84 00 data32 nopw %cs:0x0(%rax,%rax,1)
14: 00 00 00 00
18: 48 8b 57 08 mov 0x8(%rdi),%rdx
1c: 55 push %rbp
1d: 48 c7 c0 80 f0 00 00 mov $0xf080,%rax
24: 48 89 e5 mov %rsp,%rbp
27: 5d pop %rbp
28: 8b 52 18 mov 0x18(%rdx),%edx
2b:* 48 8b 14 d5 80 c3 c4 mov -0x733b3c80(,%rdx,8),%rdx <-- trapping instruction
32: 8c
33: 48 39 bc 10 68 09 00 cmp %rdi,0x968(%rax,%rdx,1)
3a: 00
3b: 0f 94 c0 sete %al
3e: 0f b6 00 movzbl (%rax),%eax

Code starting with the faulting instruction
===========================================
0: 48 8b 14 d5 80 c3 c4 mov -0x733b3c80(,%rdx,8),%rdx
7: 8c
8: 48 39 bc 10 68 09 00 cmp %rdi,0x968(%rax,%rdx,1)
f: 00
10: 0f 94 c0 sete %al
13: 0f b6 00 movzbl (%rax),%eax
[ 2838.413187] RIP task_curr (kernel/sched/core.c:1010)
[ 2838.413187] RSP <ffff880dba2ebf48>
[ 2838.413187] CR2: 000000055d996e80


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/