Re: [RFC PATCH] hrtimer: remove deadlock due to waiting on IPI in softirq context

From: Rik van Riel
Date: Wed Mar 05 2014 - 16:55:58 EST


On 03/05/2014 04:51 PM, Thomas Gleixner wrote:
On Wed, 5 Mar 2014, Rik van Riel wrote:
There appears to be a deadlock in the hrtimer code. Specifically,
clock_was_set() calls an IPI with wait=1, from softirq context.

This should not be called from softirq context.

Waiting for IPIs to complete in irq context can lead to a deadlock,
because the current code (that was interrupted) might be holding some
kind of lock, that another CPU is waiting for with spin_lock_irq or
similar.

In other words, the current CPU may need to release a resource, before
the IPI can be handled by one of the destination CPUs.

To my untrained eye, it does not look like this patch introduces a
new bug to the timer code, but that is hard to ascertain with the
timer code. so I am posting this as an RFC for the timer gods to hurt
their brains on :)

This bug was introduced by 54cdfdb4 in early 2007 (the original
hrtimer code patch).

Right and we had some issues with that until we moved the calls to
clock_was_set() out of lock held regions.

Ahh indeed, the bug got fixed already :)

The only call which happens from interrupt context is in
update_wall_time(). And that one definitely holds no locks which are
relevant.

On which kernel are you observing the issue?

This was RHEL6, and I saw that the immediate function
was still the same upstream.

I forgot to check that clock_was_set() is now called
in a different way. My bad.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/