Re: [PATCH 4/7] hvc_console: Fix race between hvc_close andhvc_remove

From: Amit Shah
Date: Wed Mar 24 2010 - 08:23:12 EST


On (Sun) Mar 21 2010 [10:07:25], Amit Shah wrote:
> On (Sun) Mar 21 2010 [08:04:39], Benjamin Herrenschmidt wrote:
> > On Fri, 2010-03-19 at 08:18 -0700, Greg Kroah-Hartman wrote:
> > > From: Amit Shah <amit.shah@xxxxxxxxxx>
> > >
> > > Alan pointed out a race in the code where hvc_remove is invoked. The
> > > recent virtio_console work is the first user of hvc_remove().
> >
> > This causes hangs during boot on pseries machines. Haven't had a chance
> > to track that down yet, but please revert
> > e74d098c66543d0731de62eb747ccd5b636a6f4c for now.
>
> I tested this with the virtio-console driver with multiple consoles on
> Linux-2.6.34-rc2 inside a KVM VM. I didn't face any lockups / badness,
> but the following popped up once on -smp 4.
>
> Could this be the same bug that's causing a lockup for you?
>
> Since the virtio-console isn't the primary console on my VM, I didn't
> see a lockup but maybe powerpc uses this as the primary console and the
> lockup is a result of this?
>
> I've not investigated this; I might get around to it next week, but
> CC'ing the sched people on this one.

I just looked at this:

> (My .config is after the oops.)
>
> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: [<ffffffff8101f4dc>] task_is_waking+0x1/0x1f
> PGD 3d261067 PUD 3d013067 PMD 0
> Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> last sysfs file: /sys/devices/virtual/block/ram13/removable
> CPU 0
> Modules linked in:
>
> Pid: 573, comm: console_check Not tainted 2.6.34-rc2 #102 /Bochs
> RIP: 0010:[<ffffffff8101f4dc>] [<ffffffff8101f4dc>]
> task_is_waking+0x1/0x1f
> RSP: 0018:ffff88003bdf5b48 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff81646e30
> RDX: ffff88003bdf5b78 RSI: ffff88003bdf5ba0 RDI: 0000000000000000
> RBP: ffff88003bdf5b78 R08: 0000000000000000 R09: ffffffff81646e08
> R10: 0000000000000046 R11: 0000000000001130 R12: 00000000001d1d00
> R13: 0000000000000000 R14: ffff88003bdf5ba0 R15: 000000000000000f
> FS: 00007f330731b6f0(0000) GS:ffff880003800000(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 000000003be78000 CR4: 00000000000006b0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process console_check (pid: 573, threadinfo ffff88003bdf4000, task
> ffff88003bc3a2d0)
> Stack:
> ffff88003bdf5b78 ffffffff8102058e 0000000000000000 0000000000000000
> <0> 0000000000000000 0000000000000000 ffff88003bdf5bd8 ffffffff81026f03
> <0> ffff88003ead8cd8 ffff88003eb10490 ffff88003bdf5bd8 ffffffff8118cea9
> Call Trace:
> [<ffffffff8102058e>] ? task_rq_lock+0x24/0x98
> [<ffffffff81026f03>] try_to_wake_up+0x4b/0x33b
> [<ffffffff8118cea9>] ? resize_console+0x25/0x95
> [<ffffffff8102721f>] wake_up_process+0x10/0x12
> [<ffffffff8118c18e>] hvc_kick+0x1a/0x1c
> [<ffffffff8118cbb4>] hvc_open+0xf6/0x102

And this suggests that hvc_kick() is called before hvc_task is
initialised, ie, before hvc_init() is called.

Does this help?

diff --git a/drivers/char/hvc_console.c b/drivers/char/hvc_console.c
index ba55bba..50ac983 100644
--- a/drivers/char/hvc_console.c
+++ b/drivers/char/hvc_console.c
@@ -285,6 +285,9 @@ EXPORT_SYMBOL_GPL(hvc_instantiate);
/* Wake the sleeping khvcd */
void hvc_kick(void)
{
+ if (!hvc_task)
+ return;
+
hvc_kicked = 1;
wake_up_process(hvc_task);
}

Amit
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/