[rcu_sched stall] regression/miss-config ?

From: Santosh Shilimkar
Date: Sun May 15 2016 - 17:18:44 EST


Hi Paul,

I was asking Sasha about [1] since other folks in Oracle
also stumbled upon similar RCU stalls with v4.1 kernel in
different workloads. I was reported similar issue with
RDS as well and looking at [1], [2], [3] and [4], thought
of reaching out to see if you can help us to understand
this issue better.

Have also included RCU specific config used in these
test(s). Its very hard to reproduce the issue but one of
the data point is, it reproduces on systems with larger
CPUs(64+). Same workload with less than 64 CPUs, don't
show the issue. Someone also told me, making use of
SLAB instead SLUB allocator makes difference but I
haven't verified that part for RDS.

Let me know your thoughts. Thanks in advance !!

Regards,
Santosh

[1] https://lkml.org/lkml/2014/12/14/304
[2] log 1: http://pastebin.uk.oracle.com/iUr9qE
[3] log 2: http://pastebin.uk.oracle.com/Oe3cr5
[4] log 3: http://pastebin.uk.oracle.com/bMYLkD
[5] rcu config: http://pastebin.uk.oracle.com/e7NXTW