Re: 2.6.25-rc3-git3: Reported regressions from 2.6.24

From: Suresh Siddha
Date: Thu Mar 06 2008 - 18:13:52 EST


On Thu, Mar 06, 2008 at 02:57:39PM -0800, Andrew Morton wrote:
> On Thu, 6 Mar 2008 13:36:32 -0800
> Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> > On Thu, 6 Mar 2008 21:59:51 +0100
> > Ingo Molnar <mingo@xxxxxxx> wrote:
> >
> > >
> > > * Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > > I'd love to poke around in kgdb (what does kthread_stop_info.k point
> > > > at?) but it seems that -mm's copy of kgdb got taken away when I wasn't
> > > > looking. Can I have it back please?
> > >
> > > it's in the full x86.git or you can pick up the kgdb-light tree:
> > >
> > > http://people.redhat.com/mingo/kgdb-light.git/README
> > >
> >
> > We'll see.
> >
> > Meanwhile, further investigation show that cpu_callback() (the one in
> > kernel/softlockup.c) is waiting on this thread:
> >
> > watchdog/1 R running task 0 8 2 task_struct:ffff81025f1089e0
>
> Note the "/1".
>
> > ffff81025f10deb0 0000000000000046 0000000000000000 0000000000000246
> > ffff81025f10de20 ffff81025f1089e0 ffff81025f1080c0 ffff81025f108d30
> > 000000015f10de50 00000000ffff2adf ffffffffffffffff ffffffffffffffff
> > Call Trace:
> > [<ffffffff80263290>] ? watchdog+0x0/0x1dc
> > [<ffffffff802632d6>] watchdog+0x46/0x1dc
> > [<ffffffff80263290>] ? watchdog+0x0/0x1dc
> > [<ffffffff8024704d>] kthread+0x44/0x6b
> > [<ffffffff8020cd88>] child_rip+0xa/0x12
> > [<ffffffff80247009>] ? kthread+0x0/0x6b
> > [<ffffffff8020cd7e>] ? child_rip+0x0/0x12
> >
> > kthread_stop_info.k=ffff81025f1089e0
> >
> > (gdb) l *0xffffffff802632d6
> > 0xffffffff802632d6 is in watchdog (kernel/softlockup.c:229).
> > 224 */
> > 225 while (!kthread_should_stop()) {
> > 226 touch_softlockup_watchdog();
> > 227 schedule();
> > 228
> > 229 if (kthread_should_stop())
> > 230 break;
> > 231
> > 232 if (this_cpu == check_cpu) {
> > 233 if (sysctl_hung_task_timeout_secs)
> >
> > so this watchdog thread seems to be runnable, but not running. What would
> > cause this?
>
> At the start of the sysrq-T trace we have:
>
> sd 1:0:0:0: [sdb] Stopping disk
> sd 0:0:0:0: [sda] Synchronizing SCSI cache
> sd 0:0:0:0: [sda] Stopping disk
> ACPI: PCI interrupt for device 0000:05:00.1 disabled
> ACPI: PCI interrupt for device 0000:05:00.0 disabled
> ACPI: Preparing to enter system sleep state S5
> Disabling non-boot CPUs ...
> CPU 1 is now offline
> SysRq : Show State
> task PC stack pid father

I have been looking into a similar issue, which stops my system going into
standy.

>
> So CPU 1 is offline. But the comatose watchdog thread is pinned to CPU 1.
> Could this be related to the problem? By what means is a task which is
> pinned to a going-away CPU handled? How is this guy supposed to ever run
> again?

move_task_off_dead_cpu() should move that thread to another online cpu. But
for some reason it isn't running.

thanks,
suresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/