Re: [BUG 2.6.25-rc3] scheduler/hotplug: some processes aredealocked when cpu is set to offline

From: Gautham R Shenoy
Date: Fri Mar 07 2008 - 04:20:22 EST


On Fri, Mar 07, 2008 at 05:54:51AM +0300, Oleg Nesterov wrote:
> On 03/06, Gautham R Shenoy wrote:
> >
> > On Tue, Mar 04, 2008 at 06:01:07PM +0300, Oleg Nesterov wrote:
> > > +static void check_running_task(struct task_struct *t, unsigned long now)
> > > +{
> > > + if (!sysctl_hung_task_timeout_secs)
> > > + return;
> > > +
> >
> > This function gets called only when t->xxx == 0,
> > so the if below doesn't mean much, does it? :)
> >
> > > + if (time_before(now, t->xxx + HZ * sysctl_hung_task_timeout_secs)
> > > + return;
> > .......
> >
> > > @@ -192,15 +214,17 @@ static void check_hung_uninterruptible_t
> > > if ((tainted & TAINT_DIE) || did_panic)
> > > return;
> > >
> > > - read_lock(&tasklist_lock);
> > > + rcu_read_lock();
> > > do_each_thread(g, t) {
> > > if (!--max_count)
> > > goto unlock;
> > > if (t->state & TASK_UNINTERRUPTIBLE)
> > > check_hung_task(t, now);
> > > + if (!t->xxx)
> > > + check_running_task(t, jiff);
>
> Of course, the check above should be
>
> if (1t->xxx)
> check_running_task(t, jiff);
>
> Thanks!
>
> >From another message,
> >
> > Me too. With your patch applied there were quite a few tasks in the
> > running state which didn't get the cpu for more than 120 seconds.
>
> (I assume you fixed the patch before using it ;)
No! Conversely, I fixed the patch because I found this behaviour a bit
odd. Couldn't run the tests again as it was a tad bit late.

>
> Just to be sure, there were no "bad ->cpu..." messages, yes?

Hopefully should be able to catch them now. If yes, it's a problem in
the way we do migration after cpu-hotplug as Yi suggested in an earlier
mail.

http://lkml.org/lkml/2008/3/6/437

This mail from akpm says the same thing.

>
> Oleg.

--
Thanks and Regards
gautham
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/