Re: Extended H/W error log driver

From: Chen Gong
Date: Tue Oct 15 2013 - 00:22:39 EST


On Mon, Oct 14, 2013 at 12:55:33PM +0200, Borislav Petkov wrote:
> Date: Mon, 14 Oct 2013 12:55:33 +0200
> From: Borislav Petkov <bp@xxxxxxxxx>
> To: Chen Gong <gong.chen@xxxxxxxxxxxxxxx>
> Cc: tony.luck@xxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx,
> linux-acpi@xxxxxxxxxxxxxxx
> Subject: Re: Extended H/W error log driver
> User-Agent: Mutt/1.5.21 (2010-09-15)
>
> On Mon, Oct 14, 2013 at 02:49:40AM -0400, Chen Gong wrote:
> > On Fri, Oct 11, 2013 at 10:04:27AM +0200, Borislav Petkov wrote:
> > > > [56005.786154] {4}Hardware error detected on CPU0
> > > > [56005.786159] {4}event severity: corrected
> > > > [56005.786162] {4}sub_event[0], severity: corrected
> > >
> > > This sub_event[0] could use better decoding though.
> > >
> > What's your suggestion?
>
> Well, if this only enumerates the sections in CPER, then we can simply
> drop it.
>
Some errors have multiple sub sections like below:

[ 1442.070522] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[ 1442.070528] {2}[Hardware Error]: event severity: corrected
[ 1442.070531] {2}[Hardware Error]: sub_event[0], severity: corrected
[ 1442.070534] {2}[Hardware Error]: section_type: memory error
[ 1442.070537] {2}[Hardware Error]: error_status: 0x0000000000000000
[ 1442.070539] {2}[Hardware Error]: sub_event[1], severity: corrected
[ 1442.070541] {2}[Hardware Error]: section_type: memory error
[ 1442.070543] {2}[Hardware Error]: error_status: 0x0000000000000000

> Btw, I was wondering, why are we dropping
> cper_estatus_section_flag_strs? This is again the same issue with
> changing output which people might already rely upon.
>

This depends on how we shrink the output information. Your reply in
another patch looks a good idea. Let me try it first.

> Thanks.
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --

Attachment: signature.asc
Description: Digital signature