Re: v2.6.31-rc6: BUG: unable to handle kernel NULL pointer dereferenceat 0000000000000008

From: Linus Torvalds
Date: Mon Aug 24 2009 - 19:53:10 EST




On Mon, 24 Aug 2009, Linus Torvalds wrote:
>
> Untested. VERY untested. Just going by "that looks odd".

Btw, one issue here is that we at least sometimes do tty_ldisc_halt()
under the tty->ldisc_mutex. Now that's fine - as long as we never take
that lock inside any delayed work - because then the delayed work itself
may need the lock we hold in order to complete, and now the
'cancel_delayed_work_sync()' thing might deadlock.

And sadly, we do end up having 'do_tty_hangup()' as a workqueue entry, and
that one does tty_ldisc_hangp, and that one in turn does take
tty->ldisc_mutex.

So it looks like either we can't use the 'sync()' version, or we should
never hold the ldisc_mutex while doing that tty_ldisc_halt(). Because
waiting for the workqueue while holding the mutex looks like it could
deadlock. It's probably very rare, but whatever.

Still, it would be good for people to test whether that patch makes the
problem go away. Just to see if the issue really is a race between
"tty_ldisc_halt()" and an ldisc being active on another CPU right then.

But I wanted to let people know that the patch is clearly not the "last
word" on this. It's a useful thing to try, but we need something better.

And it looks like we've hit that problem before, which is probably why it
didn't use sync. several of the callers of 'tty_ldisc_halt()' do a
flush_scheduled_work() afterwards, outside the ldisc_mutex. Of course, the
sane one (tty_ldisc_release()) does a tty_ldisc_halt() even before taking
the mutex lock.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/