Re: [PATCHv7] EDAC core changes in order to properly report errorsfrom all types of memory controllers

From: Mauro Carvalho Chehab
Date: Wed Mar 07 2012 - 06:36:31 EST


Em 07-03-2012 05:42, Borislav Petkov escreveu:
> On Tue, Mar 06, 2012 at 09:20:27PM -0300, Mauro Carvalho Chehab wrote:
>> The series now contains:
>
> The below looks like a good way to split this huge patchset into
> smaller, much easier to review ones:
>
>>
>> - 2 fix patches over upstream:
>> edac/ppc4xx_edac: Fix compilation

This one was reviewed already, at the first time I sent it.
So, I'll skip it on my mailbomb.

>> i5400_edac: Avoid calling pci_put_device() twice
>>
>> - 1 comments improvements:
>> edac: Improve the comments to better describe the memory concepts
>>
>> - 1 internal struct renaming patch:
>> edac: rename channel_info to rank_info
>>
>> - 6 patches that prepare the internal structures to represent the memory
>> properties per dimm, instead of per csrow. This is needed for modern
>> controllers, where the memories at different channels may be different:
>> edac: Create a dimm struct and move the labels into it
>> edac: Add per dimm's sysfs nodes
>> edac: move dimm properties to struct memset_info
>> edac: Don't initialize csrow's first_page & friends when not needed
>> edac: move nr_pages to dimm struct
>> edac: Add per-dimm sysfs show nodes
>>
>> - 2 patches that add proper support for FB-DIMM and for the modern Intel
>> DDR2/DDR3 memory controllers:
>> edac: Fix core support for MC's that see DIMMS instead of ranks
>> edac: Export MC hierarchy counters for CE and UE
>>
>> - 1 log cleanup patch, that prepares for using a MCA based tracepoint:
>> edac: Cleanup the logs for i7core and sb edac drivers
>>
>> - 2 debug improvement patches:
>> edac: Add a sysfs node to test the EDAC error report facility
>> edac: Initialize the dimm label with the known information
>>
>> - 5 post-FB-DIMM patches that cleans, fix and/or improve a few random things:
>> edac_mc_sysfs: don't create inactive errcount sysfs nodes
>> i5000_edac: Fix the logic that retrieves memory information
>> edac: add a sysfs node that stores the max possible memory location
>> edac: Call the sysfs nodes as "rank" instead of "dimm" if chip select is used
>> i5400_edac: improve debug messages to better represent the filled memory

Ok, I'll mailbomb them.

>>
>> - 1 patch that adds a trace event to report memory errors:
>> events/hw_event: Create a Hardware Events Report Mecanism (HERM)
>
> NACK to that last one.

Hmm... interesting... this one adds a tracepoint for non-MCA based memory errors...
I've understood that you've against only the mca one...

Anyway, we have a dead lock with regards to trace, as I'm nacking your approach,
and you're nacking mine.

I think we should then try to schedule a meeting (either physical or a conference)
in order to addresss it, as it doesn't sound that we'll be able to solve it via
ML.

>> While the preliminar tests is working ok on the machines I'm testing,
>> as I didn't finish the tests yet, some other fix patches may be needed,
>> but I'll insert them at the end of the series, as rebasing a large patchset
>> like that is very time-consuming.
>>
>> So, I think it is time to merge it at -next, in order to give more visibility
>> to it. So, tomorrow, I'll add it there, if I got no complains.
>
> linux-next is not a testing ground for unfinished testing, unreviewed
> patches (I'm sure you already knew that), so before you send your stuff
> anywhere, it needs to be reviewed by the interested parties. One of
> them is me, I'm sure there are others, so please split them in proper
> patchsets, as I've already asked you (the above topical split could
> work) and send them to edac-devel and people for review.
>
> Thanks.
>

Thanks,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/