Re: Bug in on_each_cpu?

From: Andrew Morton
Date: Thu Mar 01 2007 - 10:29:26 EST


On Thu, 01 Mar 2007 03:34:18 -0800 Zachary Amsden <zach@xxxxxxxxxx> wrote:

> > Why is it a bug? Because there's a deadlock where this CPU is waiting for
> > CPU A to take the IPI, but CPU A is waiting (with interrupts disabled) for
> > this CPU to take an IPI.
> >
>
> Then the bug is not in on_each_cpu(). It is in the usage of
> clock_was_set(). For example, look at do_settimeofday in kernel/timer.c:
>
> write_sequnlock_irqrestore(&xtime_lock, flags);
>
> /* signal hrtimers about time change */
> clock_was_set();
>
> return 0;

Perhaps a WARN_ON(irqs_disabled()) in clock_was_set() would help. But probably
the one in smp_call_function() will suffice.

> And timekeeping_resume has similar code (and called from a sysdev
> callback, so I don't know what the interrupt state should be ). Either
> the write_sequnlock_irqrestore is redundant, and should be merely an
> write_sequnlock_irq, or the callsite is not prepared to handle enabling
> interrupts temporarily as must be done for on_each_cpu(), which is a
> pretty scary scenario.
>
> What would be really, really nice would be to statically check all
> callsites that issue irq disables actually keep irqs disabled.
> Presumably, there was a reason they disabled irqs, and re-enabling them
> underneath their noses, even if it is to avoid a race, breaks the logic
> behind that reason.

yup. I once played with adding warnings in places like spin_lock_irq(),
but there were false positives from places which were odd-but-correct.

It would be worth revisiting however.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/