Re: linux-next: slab shrinkers: BUG at mm/list_lru.c:92

From: Glauber Costa
Date: Fri Jun 28 2013 - 10:31:38 EST

Next message: Oleg Nesterov: "Re: [PATCH] tracing/kprobe: Recover old array if fails to enablekprobe"
Previous message: Mel Gorman: "Re: [PATCH 7/8] sched: Split accounting of NUMA hinting faults thatpass two-stage filter"
In reply to: Michal Hocko: "Re: linux-next: slab shrinkers: BUG at mm/list_lru.c:92"
Next in thread: Michal Hocko: "Re: linux-next: slab shrinkers: BUG at mm/list_lru.c:92"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Jun 28, 2013 at 10:39:43AM +0200, Michal Hocko wrote:
> I have just triggered this one.
>
> [37955.364062] RIP: 0010:[<ffffffff81127e5b>] [<ffffffff81127e5b>] list_lru_walk_node+0xab/0x140
> [37955.364062] RSP: 0000:ffff8800374af7b8 EFLAGS: 00010286
> [37955.364062] RAX: 0000000000000106 RBX: ffff88002ead7838 RCX: ffff8800374af830
Note ebx

> [37955.364062] RDX: 0000000000000107 RSI: ffff88001d250dc0 RDI: ffff88002ead77d0
> [37955.364062] RBP: ffff8800374af818 R08: 0000000000000000 R09: ffff88001ffeafc0
> [37955.364062] R10: 0000000000000002 R11: 0000000000000000 R12: ffff88001d250dc0
> [37955.364062] R13: 00000000000000a0 R14: 000000572ead7838 R15: ffff88001d250dc8
Note r14

> [37955.364062] Process as (pid: 3351, threadinfo ffff8800374ae000, task ffff880036d665c0)
> [37955.364062] Stack:
> [37955.364062] ffff88001da3e700 ffff8800374af830 ffff8800374af838 ffffffff811846d0
> [37955.364062] 0000000000000000 ffff88001ce75c48 01ff8800374af838 ffff8800374af838
> [37955.364062] 0000000000000000 ffff88001ce75800 ffff8800374afa08 0000000000001014
> [37955.364062] Call Trace:
> [37955.364062] [<ffffffff811846d0>] ? insert_inode_locked+0x160/0x160
> [37955.364062] [<ffffffff8118496c>] prune_icache_sb+0x3c/0x60
> [37955.364062] [<ffffffff8116dcbe>] super_cache_scan+0x12e/0x1b0
> [37955.364062] [<ffffffff8111354a>] shrink_slab_node+0x13a/0x250
> [37955.364062] [<ffffffff8111671b>] shrink_slab+0xab/0x120
> [37955.364062] [<ffffffff81117944>] do_try_to_free_pages+0x264/0x360
> [37955.364062] [<ffffffff81117d90>] try_to_free_pages+0x130/0x180
> [37955.364062] [<ffffffff81001974>] ? __switch_to+0x1b4/0x550
> [37955.364062] [<ffffffff8110a2fe>] __alloc_pages_slowpath+0x39e/0x790
> [37955.364062] [<ffffffff8110a8ea>] __alloc_pages_nodemask+0x1fa/0x210
> [37955.364062] [<ffffffff8114d1b0>] alloc_pages_vma+0xa0/0x120
> [37955.364062] [<ffffffff81129ebb>] do_anonymous_page+0x16b/0x350
> [37955.364062] [<ffffffff8112f9c5>] handle_pte_fault+0x235/0x240
> [37955.364062] [<ffffffff8107b8b0>] ? set_next_entity+0xb0/0xd0
> [37955.364062] [<ffffffff8112fcbf>] handle_mm_fault+0x2ef/0x400
> [37955.364062] [<ffffffff8157e927>] __do_page_fault+0x237/0x4f0
> [37955.364062] [<ffffffff8116a8a8>] ? fsnotify_access+0x68/0x80
> [37955.364062] [<ffffffff8116b0b8>] ? vfs_read+0xd8/0x130
> [37955.364062] [<ffffffff8157ebe9>] do_page_fault+0x9/0x10ffff88002ead7838
> [37955.364062] [<ffffffff8157b348>] page_fault+0x28/0x30
> [37955.364062] Code: 44 24 18 0f 84 87 00 00 00 49 83 7c 24 18 00 78 7b 49 83 c5 01 48 8b 4d a8 48 8b 11 48 8d 42 ff 48 85 d2 48 89 01 74 78 4d 39 f7 <49> 8b 06 4c 89 f3 74 6d 49 89 c6 eb a6 0f 1f 84 00 00 00 00 00
> [37955.364062] RIP [<ffffffff81127e5b>] list_lru_walk_node+0xab/0x140
>
> ffffffff81127e0e: 48 8b 55 b0 mov -0x50(%rbp),%rdx
> ffffffff81127e12: 4c 89 e6 mov %r12,%rsi
> ffffffff81127e15: 48 89 df mov %rbx,%rdi
> ffffffff81127e18: ff 55 b8 callq *-0x48(%rbp) # isolate(item, &nlru->lock, cb_arg)
> ffffffff81127e1b: 83 f8 01 cmp $0x1,%eax
> ffffffff81127e1e: 74 78 je ffffffff81127e98 <list_lru_walk_node+0xe8>
> ffffffff81127e20: 73 4e jae ffffffff81127e70 <list_lru_walk_node+0xc0>
> [...]
One interesting thing I have noted here, is that r14 is basically the lower half of rbx, with
the upper part borked.

Because we are talking about a single word, this does not seem the usual update-half-of-double-word
without locking issue.

>From your excerpt, it is not totally clear what r14 is. But by looking at rdi which
is 0xffff88002ead77d0 and very probable nlru->lock due to the calling convention,
that would indicate that this is nlru->list in case you have spinlock debugging enabled.

So yes, someone destroyed our next pointer, and amazingly only half of it.

Still, the only time we ever release this lock is when isolate returns LRU_RETRY. Maybe the
way we restart is wrong? (although I can't see how)

An iput() happens outside the lock in that case, but it seems safe : if that ends up manipulating
the lru it will do so through our accessors.

I will have to think a bit more... Any other strange thing happening before it ?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Oleg Nesterov: "Re: [PATCH] tracing/kprobe: Recover old array if fails to enablekprobe"
Previous message: Mel Gorman: "Re: [PATCH 7/8] sched: Split accounting of NUMA hinting faults thatpass two-stage filter"
In reply to: Michal Hocko: "Re: linux-next: slab shrinkers: BUG at mm/list_lru.c:92"
Next in thread: Michal Hocko: "Re: linux-next: slab shrinkers: BUG at mm/list_lru.c:92"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]