Re: [PATCH 00/14] Fix the EDAC API

From: Aristeu Rozanski Filho
Date: Thu Mar 29 2012 - 16:47:00 EST


On Thu, Mar 29, 2012 at 02:06:47PM -0300, Mauro Carvalho Chehab wrote:
> The EDAC API is broken for any memory controller that doesn't use
> a DIMM rank as its primary unit.
>
> That covers RAMBUS and FB-DIMM drivers, where it is impossible to
> track a single rank, as the hank is hidden by a buffer controller
> (AMB - Advanced Memory Buffer, in the case of FB-DIMM).
>
> Also, newer Intel architectures (Nehalem and Sandy Bridge) brings
> advanced memory controllers, where the cachesize can be different
> than 128 bits, and up to 4 channels can be interlaced. The current
> EDAC API doesn't work for those.
>
> So, all drivers that need that do some sort of tricks to lie to the
> EDAC core, in order for the memory to be somehow exposed. There are
> several cases where this is done wrong.
>
> The only way to fix is to create a new ABI capable of exporting what
> the driver actually sees, and not some virtual information, produced
> by the driver just to make the EDAC core happy.
>
> As requested by Greg, the first step is to convert the EDAC MC code
> to use struct device. That means that 3 drivers also need to be
> converted (amd64, i7core and mpc85xx_edac), as they create their own
> ABI's.
>
> Those patches were compile-tested on all architectures.
>
> It was also tested on all types of Memory Controllers with EDAC support
> I was able to find at Red Hat Labs:
> e752x_edac (a Xeon i3100 chipset)
> i3000_edac
> i3200_edac
> i5000_edac
> i5100_edac
> i5400_edac
> i7300_edac
> i7core_edac (Nehalem)
> sb_edac (Sandy Bridge E5)
> amd64_edac
>
> Several of them with multiple memory controllers (the amd64 hardware
> I used is the bigger one, in terms of MC, with 8 memory controllers).
>
> There are 3 intended changes that are out of this series:
>
> - ABI documentation. I'll write the ABI patch as soon as I
> merge this series at -next;
>
> - New API UE/CE error counters. They're needed, but, as the
> discussions weren't finished, let's postpone it. I'll start work on
> it after the merge of this series.
>
> - MCA error trace. Also, there wasn't any agreement yet.
> So, keep this out of this series, until we come to some conclusion.
>
> Regards,
> Mauro
>
> Mauro Carvalho Chehab (14):
> edac: rewrite the sysfs code to use struct device
> mpc85xx_edac: convert sysfs logic to use struct device
> amd64_edac: convert sysfs logic to use struct device
> i7core_edac: convert it to use struct device
> edac: Get rid of the old kobj's from the edac mc code
> edac: add a new per-dimm API and make the old per-virtual-rank API
> obsolete
> edac: add a sysfs node to report the maximum location for the system
> edac: Add debufs nodes to allow doing fake error inject
> edac: Create a per-Memory Controller bus
> edac: Move grain/dtype/edac_type calculus to be out of channel loop
> i82975x_edac: Test nr_pages earlier to save a few CPU cycles
> i5100_edac: Fix a warning when compiled with 32 bits
> i7300_edac: Get rid of some wrongly-solved rebase conflict
> edac: Only expose csrows/channels on legacy API if they're populated
>
> drivers/edac/Kconfig | 8 +
> drivers/edac/amd64_edac.c | 43 +-
> drivers/edac/amd64_edac.h | 29 +-
> drivers/edac/amd64_edac_dbg.c | 89 ++--
> drivers/edac/amd64_edac_inj.c | 128 +++--
> drivers/edac/cpc925_edac.c | 54 +-
> drivers/edac/e752x_edac.c | 31 +-
> drivers/edac/e7xxx_edac.c | 32 +-
> drivers/edac/edac_mc.c | 60 +-
> drivers/edac/edac_mc_sysfs.c | 1322 +++++++++++++++++++++--------------------
> drivers/edac/edac_module.c | 13 +-
> drivers/edac/edac_module.h | 9 +-
> drivers/edac/i5000_edac.c | 3 -
> drivers/edac/i5100_edac.c | 4 +-
> drivers/edac/i7300_edac.c | 3 -
> drivers/edac/i7core_edac.c | 336 +++++++----
> drivers/edac/i82875p_edac.c | 4 -
> drivers/edac/i82975x_edac.c | 9 +-
> drivers/edac/mpc85xx_edac.c | 93 ++--
> include/linux/edac.h | 69 +--
> 20 files changed, 1250 insertions(+), 1089 deletions(-)
Reviewed-by: Aristeu Rozanski <arozansk@xxxxxxxxxx>

--
Aristeu

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/