Re: kswapd craziness round 2

From: Daniel J Blueman
Date: Mon Feb 18 2013 - 01:18:44 EST


On Monday, 18 February 2013 06:10:02 UTC+8, Jiri Slaby wrote:
> Hi,
>
> You still feel the sour taste of the "kswapd craziness in v3.7" thread,
> right? Welcome to the hell, part two :{.
>
> I believe this started happening after update from
> 3.8.0-rc4-next-20130125 to 3.8.0-rc7-next-20130211. The same as before,
> many hours of uptime are needed and perhaps some suspend/resume cycles
> too. Memory pressure is not high, plenty of I/O cache:
> # free
> total used free shared buffers cached
> Mem: 6026692 5571184 455508 0 351252 2016648
> -/+ buffers/cache: 3203284 2823408
> Swap: 0 0 0
>
> kswap is working very toughly though:
> root 580 0.6 0.0 0 0 ? S úno12 46:21 [kswapd0]
>
> This happens on I/O activity right now. For example by updatedb or find
> /. This is what the stack trace of kswapd0 looks like:
> [<ffffffff8113c431>] shrink_slab+0xa1/0x2d0
> [<ffffffff8113ecd1>] kswapd+0x541/0x930
> [<ffffffff810a3000>] kthread+0xc0/0xd0
> [<ffffffff816beb5c>] ret_from_fork+0x7c/0xb0
> [<ffffffffffffffff>] 0xffffffffffffffff

Likewise with 3.8-rc, I've been able to reproduce [1] a livelock scenario which hoses the box and observe RCU stalls are observed [2].

There may be a connection; I'll do a bit more debugging in the next few days.

Daniel

--- [1]

1. live-booted image using ramdisk
2. boot 3.8-rc with <16GB memory and without swap
3. run OpenMP NAS Parallel Benchmark dc.B against local disk (ie not ramdisk)
4. observe hang O(30) mins later

--- [2]

[ 2675.587878] INFO: rcu_sched self-detected stall on CPU { 5} (t=24000 jiffies g=6313 c=6312 q=68)
--
Daniel J Blueman
Principal Software Engineer, Numascale Asia
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/