Re: [PATCH v24b] RAS: Add a tracepoint for reporting memorycontroller events

From: Borislav Petkov
Date: Fri May 18 2012 - 08:43:54 EST


On Fri, May 18, 2012 at 07:59:55AM -0300, Mauro Carvalho Chehab wrote:
> The "detail" field above is, in fact, a group of the following integer
> values:
>
> - physical address of the error, provided as 3 integers: page, offset an
> the error offset granularity;
>
> - ECC syndrome, with has a XOR value used to determine how many and what
> bit(s) have errors.
>
> "driver_detail" contains other driver-specific details about the errors.
>
> The "location" field is also given by 3 integers: layer0, layer1, layer2 with
> contains the branch/slot/channel or slot/channel or chip select row/channel
> position of the affected DIMM.
>
> On my last comment, I said that I'm convinced that those two fields should
> be broken into their integer components.
>
> The fields at "detail" and "location" are generated by a snprintf() logic
> inside the EDAC core. "driver_detail" is filled by the drivers.
>
> I did that merge in order to provide a clearer output message, but this
> doesn't help userspace tools that could get the binary information at
> the trace directly.
>
> There's a conceptual issue here, and it seems that Boris is not understanding,
> probably because he doesn't have any experience on handling errors on userspace.

Probably you're too stuck up with your error handling "experience" and
you fail to see that I'm trying to make this as generic as possible and
not fit it into anything so that EVERYONE can use it.

> In this point, my previous experience on writing network management
> and my work at the userspace EDAC tool helps me to see both faces of
> the coin.
>
> Currently, EDAC prints an error message, and increments some Kernel counters
> to indicate where the error is located.
>
> Userspace tools can either get the dmesg logs or read the error counters.
> The edac-tools (with is the only public tool for EDAC that I'm aware of)
> prefers to rely on the error counters, instead of parsing the error logs.
>
> I suspect that one of the reasons is because printk messages are not a
> consistent ABI, as changing them are not considered as a Kernel regression.

Oh yeah, teach me more about that, that's like not appearing maybe once
a week on lkml but what do I know...

> There are several issues on relying at the Kernel error counters: they
> don't have decay or any other kind of logic that would allow detecting
> bursts.

Guess what, you can do decay very successfully in userspace.

> So, on a machine running for 30 days with, let's say, 10 corrected
> errors, it is not possible to tell if they were all generated on a
> burst (because of some temporary interference) or if they are sparse
> errors generated at the same DRAM, indicating a bad memory.

Why not, you don't have timestamps?

> The big advantage (maybe the only advantage) of using a tracepoint is that
> the binary data can be exported directly.

Really?? You don't say. Teach me more about tracepoints and why it is a
good thing to use them for error reporting.

> If the location and memory location integers are exported as integers,
> it is simple to change the edac-tools to work with tracepoints,
> instead of working with the error counters.

Yeah, so?

> With this simple change, the tool can be improved to provide a more
> consistent memory error report, as userspace should not be worried on
> parsing fields that may change in future without notice.

Which userspace, your userspace? What happens if another userspace comes
with slightly different error formatting requirements? We change the
kernel again or we can't because this tracepoint is being used and we
add another tracepoint and let the bloat begin?

> Also, doing that will avoid the extra effort of joining everything into
> a single string, and then breaking them back into their individual fields
> on userspace.

I'm being told this is very easy to do in userspace in your garden
variety language.

> > but feed it
> > straight into a MIB (which could mean "Men In Black" for all I know),
> > right from the tracepoint:
>
> I can do a s/MIB/Management Information Base/. We can also avoid all other
> acronyms that have more than one sense, like RAS (with all network
> people will associate it with "Remote Access Service").

Oh, I know, MIB=Mauro Is Best.

> >> Of course, any userspace tools meant to handle errors should not parse
> >> the above data. They should, instead, use the binary fields provided by
> >> the tracepoint, mapping them directly into their MIBs.
> >
> > And I wanted to have a generic, usable-for-all tracepoint output
> > which anyone in userspace can parse, decode, cut, paste as she sees
> > fit without forcing kernel output formatting into any abstract error
> > management hierarchy or whatever.
>
> s/printk/trace/? That doesn't sound a good idea. The advantage of
> tracepoints of printk's is that they provide an ABI that allows passing
> strucutured information to userspace.

WTF? Maybe you don't understand what I'm talking about at all and I'm
wasting my time completely.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/