Re: [BUG] [ tip/sched/core ] System unresponsive after booting

From: Peter Zijlstra
Date: Thu Jan 16 2014 - 09:18:32 EST


On Thu, Jan 16, 2014 at 02:48:51PM +0100, Daniel Lezcano wrote:
> 3570 sched_getparam(3570, { 0 }) = 0
> 3570 sched_getscheduler(3570) = 0 (SCHED_OTHER)
> 3570 sched_get_priority_min(SCHED_OTHER) = 0
> 3570 sched_get_priority_max(SCHED_OTHER) = 0
> 3571 sched_get_priority_min(SCHED_OTHER) = 0
> 3571 sched_get_priority_max(SCHED_OTHER) = 0
> 3571 sched_get_priority_min(SCHED_OTHER) = 0
> 3571 sched_get_priority_max(SCHED_OTHER) = 0
> 3571 sched_setscheduler(3572, SCHED_OTHER, { 0 } <unfinished ...>
> 3571 <... sched_setscheduler resumed> ) = 0
> 3571 sched_get_priority_min(SCHED_OTHER <unfinished ...>
> 3571 <... sched_get_priority_min resumed> ) = 0
> 3571 sched_get_priority_max(SCHED_OTHER <unfinished ...>
> 3571 <... sched_get_priority_max resumed> ) = 0
> 3571 sched_setscheduler(3573, SCHED_OTHER, { 0 } <unfinished ...>
> 3571 <... sched_setscheduler resumed> ) = -1 EPERM (Operation not
> permitted)
> 3571 sched_get_priority_min(SCHED_OTHER <unfinished ...>
> 3571 <... sched_get_priority_min resumed> ) = 0
> 3571 sched_get_priority_max(SCHED_OTHER <unfinished ...>
> 3571 <... sched_get_priority_max resumed> ) = 0
> 3571 sched_setscheduler(3574, SCHED_OTHER, { 0 } <unfinished ...>
> 3571 <... sched_setscheduler resumed> ) = -1 EPERM (Operation not
> permitted)
>
> The same strace but on a kernel which does not hang. The calls to
> sched_setscheduler do not fail.
>
> 3292 sched_getparam(3292, { 0 }) = 0
> 3292 sched_getscheduler(3292) = 0 (SCHED_OTHER)
> 3292 sched_get_priority_min(SCHED_OTHER) = 0
> 3292 sched_get_priority_max(SCHED_OTHER) = 0
> 3293 sched_get_priority_min(SCHED_OTHER) = 0
> 3293 sched_get_priority_max(SCHED_OTHER) = 0
> 3293 sched_get_priority_min(SCHED_OTHER) = 0
> 3293 sched_get_priority_max(SCHED_OTHER) = 0
> 3293 sched_setscheduler(3294, SCHED_OTHER, { 0 } <unfinished ...>
> 3293 <... sched_setscheduler resumed> ) = 0
> 3293 sched_get_priority_min(SCHED_OTHER <unfinished ...>
> 3293 <... sched_get_priority_min resumed> ) = 0
> 3293 sched_get_priority_max(SCHED_OTHER <unfinished ...>
> 3293 <... sched_get_priority_max resumed> ) = 0
> 3293 sched_setscheduler(3295, SCHED_OTHER, { 0 } <unfinished ...>
> 3293 <... sched_setscheduler resumed> ) = 0
> 3293 sched_get_priority_min(SCHED_OTHER <unfinished ...>
> 3293 <... sched_get_priority_min resumed> ) = 0
> 3293 sched_get_priority_max(SCHED_OTHER <unfinished ...>
> 3293 <... sched_get_priority_max resumed> ) = 0
> 3293 sched_setscheduler(3296, SCHED_OTHER, { 0 } <unfinished ...>
> 3293 <... sched_setscheduler resumed> ) = 0
>
> The EPERM error comes from kernel/sched/core.c:3303
>
> ...
> if (fair_policy(policy)) {
> if (!can_nice(p, attr->sched_nice))
> return -EPERM;
> }
> ...
>
>
> But I don't know why this is leading to block a process or making rsyslogd
> being not woken up by a packet coming in the af_unix socket.

Could you test with a fresh tip/master, Ingo just pushed out a stack of
fixes, in particularly:

e3de300d1212b ("sched: Preserve the nice level over sched_setscheduler() and sched_setparam() calls")
39fd8fd22b322 ("sched: Fix up scheduler syscall LTP fails")

Could have affected things.

Meanwhile I'll try and better read what the above says.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/