Re: [PATCH v5 01/20] EDAC/synopsys: Fix ECC status data and IRQ disable race condition

From: Borislav Petkov
Date: Sun Apr 21 2024 - 06:08:19 EST


On Tue, Apr 16, 2024 at 01:06:11PM +0300, Serge Semin wrote:
> It looks indeed crazy because the method is called enable_intr() and
> is called in the IRQ handler. Right, re-enabling the IRQ in the handler
> doesn't look good. But under the hood it was just a way to fix the
> problem described in the commit you cited. enable_intr() just gets
> back the IRQ Enable flags cleared a bit before in the
> zynqmp_get_error_info() method.
>
> The root cause of the problem is that the IRQ status/clear flags:
> ECCCLR.ecc_corrected_err_clr (R/W1C)
> ECCCLR.ecc_uncorrected_err_clr (R/W1C)
> ECCCLR.ecc_corr_err_cnt_clr (R/W1C)
> ECCCLR.ecc_uncorr_err_cnt_clr (R/W1C)
> etc
>
> and the IRQ enable/disable flags (since v3.10a):
> ECCCLR.ecc_corrected_err_intr_en (R/W)
> ECCCLR.ecc_uncorrected_err_intr_en (R/W)
>
> reside in a single register - ECCCLR (Synopsys has renamed it to
> ECCCTL since v3.10a due to adding the IRQ En/Dis flags).
>
> Thus any concurrent access to that CSR like "Clear IRQ
> status/counters" and "IRQ disable/enable" need to be protected from
> the race condition.

Ok, let's pick this apart one-by-one. I'll return to the rest you're
explaining as needed.

So, can writes to the status/counter bits while writing the *same* bit
to the IRQ enable/disable bit prevent any race conditions?

Meaning, you only change the status and counter bits and you preserve
the same value in the IRQ disable/enable bit?

IOW, I'm thinking of shadowing that ECCCTL in software so that we update
it from the shadowed value.

Because, AFAIU, the spinlock won't help if you grab it, clear the
status/counter bits and disable the interrupt in the process. You want
to only clear the status/counter bits and leave the interrupt enabled.

Right?

IOW, in one single write you do:

ECCCLR.ecc_corrected_err_clr=1
ECCCLR.ecc_uncorrected_err_clr=1
ECCCLR.ecc_corr_err_cnt_clr=1
ECCCLR.ecc_uncorr_err_cnt_clr=1
ECCCLR.ecc_corrected_err_intr_en=1
ECCCLR.ecc_uncorrected_err_intr_en=1

?

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette