DMA-safety for edac_atomic_scrub on ARM

From: Jan Lübbe
Date: Fri Jun 09 2017 - 09:56:37 EST


Hi,

I've CCed Rob as the original author of the ARM EDAC scrub function.

On Fr, 2017-06-09 at 15:14 +0200, Jan LÃbbe wrote:
> [...]
> > + mci->scrub_mode = SCRUB_SW_SRC;
> I'm not sure if this works as expected ARM as it is currently
> implemented, but that's a topic for a different mail.

Some more background as I understand it so far:

Configuring a EDAC MC to scrub_mode = SCRUB_SW_SRC causes the common
handler code to run a per-arch software scrub function. It is used by
several EADC drivers on ARM.

drivers/edac/edac_mc.c:
> if (mci->scrub_mode == SCRUB_SW_SRC) {
> /*
> * Some memory controllers (called MCs below) can remap
> * memory so that it is still available at a different
> * address when PCI devices map into memory.
> * MC's that can't do this, lose the memory where PCI
> * devices are mapped. This mapping is MC-dependent
> * and so we call back into the MC driver for it to
> * map the MC page to a physical (CPU) page which can
> * then be mapped to a virtual page - which can then
> * be scrubbed.
> */
> remapped_page = mci->ctl_page_to_phys ?
> mci->ctl_page_to_phys(mci, page_frame_number) :
> page_frame_number;
>
> edac_mc_scrub_block(remapped_page,
> offset_in_page, grain);
> }

edac_mc_scrub_block() then basically checks if it actually hit a valid
PFN, maps the page with kmap_atomic and runs edac_atomic_scrub().
For ARM this is implemented in arch/arm/include/asm/edac.h:
> /*
> * ECC atomic, DMA, SMP and interrupt safe scrub function.
> * Implements the per arch edac_atomic_scrub() that EDAC use for software
> * ECC scrubbing. It reads memory and then writes back the original
> * value, allowing the hardware to detect and correct memory errors.
> */
>
> static inline void edac_atomic_scrub(void *va, u32 size)
> {
> #if __LINUX_ARM_ARCH__ >= 6
> unsigned int *virt_addr = va;
> unsigned int temp, temp2;
> unsigned int i;
>
> for (i = 0; i < size / sizeof(*virt_addr); i++, virt_addr++) {
> /* Very carefully read and write to memory atomically
> * so we are interrupt, DMA and SMP safe.
> */
> __asm__ __volatile__("\n"
> "1: ldrex %0, [%2]\n"
> " strex %1, %0, [%2]\n"
> " teq %1, #0\n"
> " bne 1b\n"
> : "=&r"(temp), "=&r"(temp2)
> : "r"(virt_addr)
> : "cc");
> }
> #endif
> }

The comment "ECC atomic, DMA, SMP and interrupt safe scrub function"
seems to be copied from the initial implementation on x86, first to
powerpc, to mips and later to arm.

On ARM, other bus masters are usually not coherent to the CPUs. This
means that exclusive loads/stores only affect (at most) the sharable
domain, which is usually a subset of the SoC.

Consequently, when a bus master outside of the shareable domain (such as
a ethernet controller) writes to the same memory after the ldrex, the
strex will simply succeed and write stale data to the CPU cache, which
can later overwrite the data written by the ethernet controller.

I'm not sure if it is actually possible to implement this in a DMA-safe
way on ARM without stopping all other bus masters.

Regards,
Jan
--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |