Re: [PATCH v2 8/8] x86/MCE/AMD Support new memory interleaving modes during address translation

From: Borislav Petkov
Date: Mon Sep 28 2020 - 14:14:18 EST


On Mon, Sep 28, 2020 at 10:53:50AM -0500, Yazen Ghannam wrote:
> I don't have any clear reasons. I just get vague use cases sometimes
> about not using EDAC and relying on other things. But it shouldn't hurt
> to have the module load anyway. The EDAC messages can be suppressed, and
> the sysfs interface can be ignored. So, after a bit more thought, this
> doesn't seem like a good reason.

Ok. We can always carve it out if someone comes up with a valid reason
later.

> I agree that the translation code is implementation-specific and applies
> only to DRAM ECC errors, so it make sense to have it in amd64_edac. The
> only issue is getting the address translation to earlier notifiers. I
> think we can add a new one in amd64_edac to run before others. Maybe this
> can be a new priority class like MCE_PRIO_PREPROCESS, or something like
> that for notifiers that fixup the MCE data.

Well, I'm not sure you need notifiers here - you wanna call
mce_usable_address() and in it, it should do the address conversion
calculation to give you a physical address which you can feed to
memory_failure etc.

Now, mce_usable_address() is core code and we can make core code call
into a module but that is yucky. So *that* is your reason for keeping it
where it is.

Looking at its size:

$ readelf -s vmlinux | grep umc_normaddr_to
2864: ffffffff817d8ae5 168 FUNC LOCAL DEFAULT 1 umc_normaddr_to_[...]
91866: ffffffff81030e00 1127 FUNC GLOBAL DEFAULT 1 umc_normaddr_to_[...]

that's something like ~1.3K and if you split it and do some
experimenting, you might get it even slimmer. Not that ~1.3K is that
huge for current standards but we should always aim at not bloating the
fat guy our kernel already is.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette