Re: [PATCH] RAS: Add a tracepoint for reporting memory controllerevents

From: Borislav Petkov
Date: Thu May 31 2012 - 13:19:56 EST


On Thu, May 31, 2012 at 04:51:27PM +0000, Luck, Tony wrote:
> No, it's a 6-bit field used as a shift ... so if it has value "6", it
> means cache line granularity. Value "12" would mean 4K granularity.
> Architecturally it could say "30" to mean gigabyte, or even "63" to
> mean "everything is gone".

Right, 0x3f are 6 bits, correct, doh!

> >> while a few (IIRC patrol scrub) will report with page (4K)
> >> granularity. Linux doesn't really care - they all have to get rounded
> >> up to page size because we can't take away just one cache line from a
> >> process.
> >
> > I'd like to see that :-)
>
> Patrol scrub works inside the depths of the memory controller on rank/row
> addresses, not on system physical addresses. When it finds a problem, a
> reverse translation is needed to be able to report a system physical
> address in MCi_ADDR. Getting all the bits right is apparently a hard thing
> to do, so the MCI_MISC_ADDR_LSB bits are used to indicate that some low
> order bits are not valid.

Ok, thus the dynamic granularity. But we're going to end up reporting
rank and row too so that it can be matched to the DIMM. I consider
physical address a bonus in such cases and it is only of importance to
those who like to replace single DRAM chips or single MOSFET transistors
:-) :-) :-).

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/