Re: [BUG] [ tip/sched/core ] System unresponsive after booting

From: Ingo Molnar
Date: Wed Jan 15 2014 - 04:23:01 EST



* Daniel Lezcano <daniel.lezcano@xxxxxxxxxx> wrote:

>
> Hi all,
>
> I use the tip/sched/core branch.
>
> After git pulling yesterday, my host is unresponsive after booting the OS.
>
> * It boots normally
> * It sends info to the console
> * The graphics does not work
> * The terminals show the prompt, I can enter the username but after
> pressing enter, it does not give the password prompt
> * sysrq works more or less, I can't get the process stack but it
> receives the command
>
> It is like no new process can be created.
>
> I have a dual Xeon processor E5325 (2 x 4 cores).
>
> After git bisecting, the following patch seems to introduce the bug.
>
> commit d50dde5a10f305253cbc3855307f608f8a3c5f73
> Author: Dario Faggioli <raistlin@xxxxxxxx>
> Date: Thu Nov 7 14:43:36 2013 +0100
>
> sched: Add new scheduler syscalls to support an extended
> scheduling parameters ABI
>
> Add the syscalls needed for supporting scheduling algorithms
> with extended scheduling parameters (e.g., SCHED_DEADLINE).
>
>
> [ ... ]
>
>
> Signed-off-by: Dario Faggioli <raistlin@xxxxxxxx>
> [ Rewrote to use sched_attr. ]
> Signed-off-by: Juri Lelli <juri.lelli@xxxxxxxxx>
> [ Removed sched_setscheduler2() for now. ]
> Signed-off-by: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Link: http://lkml.kernel.org/r/1383831828-15501-3-git-send-email-juri.lelli@xxxxxxxxx
> Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>

I checked this patch again, and noticed a few oddities:

1)

There's this change to __setscheduler():

-__setscheduler(struct rq *rq, struct task_struct *p, int policy, int prio)
+/* Actually do priority change: must hold pi & rq lock. */
+static void __setscheduler(struct rq *rq, struct task_struct *p,
+ const struct sched_attr *attr)
{
+ int policy = attr->sched_policy;
+
p->policy = policy;
- p->rt_priority = prio;
+
+ if (rt_policy(policy))
+ p->rt_priority = attr->sched_priority;
+ else
+ p->static_prio = NICE_TO_PRIO(attr->sched_nice);
+


doesnt this change keep p->rt_priority uninitialized in the
normalize_task() case?

I.e. rt_priority should still be set unconditionally. In the
SCHED_NORMAL case that will be a zero initialization.

2)

It's not clear why this change to __setscheduler() was done:

/*
* Allow unprivileged RT tasks to decrease priority:
*/
if (user && !capable(CAP_SYS_NICE)) {
+ if (fair_policy(policy)) {
+ if (!can_nice(p, attr->sched_nice))
+ return -EPERM;
+ }
+
if (rt_policy(policy)) {


3)

On ARM:

-#define __NR_syscalls (380)
+#define __NR_syscalls (384)

but:

#define __NR_finit_module (__NR_SYSCALL_BASE+379)
+#define __NR_sched_setattr (__NR_SYSCALL_BASE+380)
+#define __NR_sched_getattr (__NR_SYSCALL_BASE+381)

/*


Why is the syscall table increased by 4, while we only add 2 new
syscalls?

4)

In hindsight this patch should have been 3 patches or so:

- one that just mechanically extends __setscheduler() with an 'attr'
way to pass parameters

- one that adds whatever other desired changes to __setscheduler(),
with an explanation.

- one that adds the new syscalls.

Which would ease the debugging of such bugs.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/