Re: [PATCH] [2/2] Don't complain about disabled irqs when the system has paniced

From: Nick Piggin
Date: Wed Sep 03 2008 - 02:02:57 EST


On Wednesday 03 September 2008 01:00, Andi Kleen wrote:
> On Tue, Sep 02, 2008 at 04:45:17PM +0200, Peter Zijlstra wrote:

> > The deadlock scenario is long the lines of two such smp_call_function*()
> > both under irq disabled calling each other with CSD_FLAG_WAIT set.
> > Neither remote cpu will handle the IPI due to irqs being disabled, so
> > we'll wait at-infinitum for completion.
>
> First smp_send_stop does not wait (or at least not pass the
> wait flag, it will still wait for the first ack like everyone else)

It won't wait, but it may have to wait for an ack as you say (but now
this is a very rare case when kmalloc fails rather than always having
to wait for so long).

So yes, it does wait in some cases. If interrupts are disabled then it
will stop processing IPIs which are sent to it from another CPU, which
might be also calling smp_send_stop and which itself is not processing
IPIs from the CPU. This is the deadlock.

We could serialise smp_send_stop (using a simple xchg, in case we panic
in lockdep), and then it is possible to get away without deadlocking.


> I don't claim to understand the new kernel/smp.c code (it seems to me quite
> overdesigned and complicated and I admit I got lost in it somewhere),
> but I think your scenario would rely on a global lock and presumably there
> is none in the new code?

Overdesigned? Well the old code was horribly slow and synchronized, and
made it useless for doing anything except things like smp_send_stop so
I would say it was under designed ;)


> > > Besides do you prefer to not allow panic from interrupts/machine
> > > checks etc. anymore?
> >
> > However did I imply that, I just said your fix looked iffy.
>
> Well it would be the only alternative. Or have a timeout (I had
> such a hack a long time ago) but that has also other issues.
>
> In fact for smp_send_stop() it would be far better to just use an NMI,
> but we unfortunately have a few BIOS that do not support NMI properly.
>
> I think for 2.6.27 at least this is the best fix. At least keeping
> panic that broken is no option I would say.

It is reasonable I think, but I don't like testing symbolic constants
with inequalities like in your patch 2/2. Can you just make it
system_state != SYSTEM_PANIC ?

Also... is it really a regression? The old code definitely had deadlock
warnings too, but maybe they were made more complete in the new code?

Thanks,
Nick
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/