Re: PROBLEM: Kernel BUG in mfgpt_tick (cs5535-clockevt.c) on ALIX 2c3 - null call

From: David Kozub
Date: Mon Oct 16 2017 - 20:02:24 EST


On Mon, 16 Oct 2017, Thomas Gleixner wrote:

On Thu, 12 Oct 2017, David Kozub wrote:
On Thu, 12 Oct 2017, Thomas Gleixner wrote:
On Thu, 12 Oct 2017, Daniel Lezcano wrote:
The real question is why

/* Set the clock scale and enable the event mode for CMP2 */
val = MFGPT_SCALE | (3 << 8);

cs5535_mfgpt_write(cs5535_event_clock, MFGPT_REG_SETUP, val);

is in the setup code at all.

The obvious place for this is in mfgpt_set_periodic() which gets called
when the clock event and the handler is set up in the core code. Up to that
point the interrupt handler is protected against shared interrupts via the
is_shutdown() check.

It is, but only after clockevents_config_and_register. But mfgpt_tick is
called before that (just after setup_irq).

Right, but the protection against the spurious interrupt there is not
sufficient. See patch below.

Thanks,

tglx

8<-------------------

--- a/drivers/clocksource/cs5535-clockevt.c
+++ b/drivers/clocksource/cs5535-clockevt.c
@@ -117,7 +117,8 @@ static irqreturn_t mfgpt_tick(int irq, void *dev_id)
/* Turn off the clock (and clear the event) */
disable_timer(cs5535_event_clock);

- if (clockevent_state_shutdown(&cs5535_clockevent))
+ if (clockevent_state_detached(&cs5535_clockevent) ||
+ clockevent_state_shutdown(&cs5535_clockevent))
return IRQ_HANDLED;

/* Clear the counter */

Hi Thomas,

Thanks for the patch. I can confirm that this resolves the issue.

I still think the fact that the handler is triggered at all is a bit strange but maybe ensuring the handler doesn't do anything bad is sufficient to declare this issue fixed. After all I'm probbly the last person in the known universe using this driver. :)

Would it not be nicer to also explicitly set cs5535_clockevent's state_use_accessors to CLOCK_EVT_STATE_DETACHED (in the initialization of cs5535_clockevent)? Maybe that field is not to be touched directly from outside, on the other hand in the current code the check in the handler relies on CLOCK_EVT_STATE_DETACHED being 0 (and so cs5535_clockevent's state_use_accessors being CLOCK_EVT_STATE_DETACHED by default).

I was also wondering how is it ensured that the handler can never see an inconsistent state of state_use_accessors but after going through the code I think the key is clockevents_register_device's raw_spin_lock_irqsave(&clockevents_lock, flags); (even though the set to DETACHED happens outside, but we're already starting in DETACHED state). I also wonder if this would still be safe if there were multiple CPUs - but this is not happening on the ALIX as it has just one CPU.

Best regards,
David