Re: Fwd: [PATCH] x86: Use larger chunks in mtrr_cleanup

From: Luis R. Rodriguez
Date: Thu Nov 05 2015 - 14:44:24 EST


On Thu, Nov 5, 2015 at 11:14 AM, Yinghai Lu <yinghai@xxxxxxxxxx> wrote:
> On Mon, Sep 14, 2015 at 7:46 AM, Stuart Hayes <stuart.w.hayes@xxxxxxxxx> wrote:
>>
>> Booting with 'disable_mtrr_cleanup' works, but the system I am working with
>> isn't actually failing--it just gets ugly error messages. And the BIOS on the
>> system I am working with had set up the MTRRs correctly.
>
> Please post boot log and /proc/mtrr for:
> 1. without your patch
> 2. without your patch and with disable_mtrr_cleanup in boot command line.
> 3. with your patch.

Stuart,

to provide some context -- I reached out to Yinghai as he wrote the
original mtrr cleanup code. The commit logs seem to read that a crash
was possible on systems with > 4 GiB RAM with some types of BIOSes...
The cleanup code seems to trigger when variable MTRRs do not exist
that are UC, or when all varible MTRRs that exist are just UC + WB
(Yinghai correct me if I'm wrong). The commit log in question
(95ffa2438d0e9 "x86: mtrr cleanup for converting continuous to
discrete layout, v8") was not very clear about the cause of the crash
-- but suppose the issue here was the BIOS on some systems might want
to create some UC variable MTRRs early on and there was no UC MTRRs
available, and I can only guess the cleanup exists as hack for those
BIOSes. Even if that was the case -- its still not clear *why* the
crash would happen but I suppose a driver mishap can happen without UC
guarantees for some devices the BIOS may want to enable UC MTRR on.

To be able to determine what we do upstream we need to understand the
above first. We also need to understand if the cleanup might also be
implicated by userspace drivers using /proc/mtrr, or if a proprietary
driver exists that does use mtrr_add() directly even though PAT has
been available for ages and all drivers are now properly converted.

With clear answers to the above we'll be able to determine what the
right course of action should be for this patch. For instance I'm
inclined to strive to disable the complex cleanup code if we don't
need it anymore, but if we do need it your patch makes sense. If the
patch makes sense then though are we going to have to keep updating
the segment size *every time* as systems grow? That seems rather
silly. And if PAT is prevalent why are vendors adding MTRRs still? The
cleanup seems complex and a major hack for a fix for some BIOSes, I'd
much rather identify the exact issue and only have a fix to address
that case.

Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/