Re: linux-next: slab shrinkers: BUG at mm/list_lru.c:92

From: Glauber Costa
Date: Mon Jun 17 2013 - 11:14:23 EST


On Mon, Jun 17, 2013 at 04:18:22PM +0200, Michal Hocko wrote:
> Hi,

Hi,

> I managed to trigger:
> [ 1015.776029] kernel BUG at mm/list_lru.c:92!
> [ 1015.776029] invalid opcode: 0000 [#1] SMP
> with Linux next (next-20130607) with https://lkml.org/lkml/2013/6/17/203
> on top.
>
> This is obviously BUG_ON(nlru->nr_items < 0) and
> ffffffff81122d0b: 48 85 c0 test %rax,%rax
> ffffffff81122d0e: 49 89 44 24 18 mov %rax,0x18(%r12)
> ffffffff81122d13: 0f 84 87 00 00 00 je ffffffff81122da0 <list_lru_walk_node+0x110>
> ffffffff81122d19: 49 83 7c 24 18 00 cmpq $0x0,0x18(%r12)
> ffffffff81122d1f: 78 7b js ffffffff81122d9c <list_lru_walk_node+0x10c>
> [...]
> ffffffff81122d9c: 0f 0b ud2
>
> RAX is -1UL.
Yes, fearing those kind of imbalances, we decided to leave the counter as a signed quantity
and BUG, instead of an unsigned quantity.

>
> I assume that the current backtrace is of no use and it would most
> probably be some shrinker which doesn't behave.
>
There are currently 3 users of list_lru in tree: dentries, inodes and xfs.
Assuming you are not using xfs, we are left with dentries and inodes.

The first thing to do is to find which one of them is misbehaving. You can try finding
this out by the address of the list_lru, and where it lays in the superblock.

Once we know each of them is misbehaving, then we'll have to figure out why.

Any special filesystem workload ?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/