Re: Are these MTRR settings correct?

From: Robert Hancock
Date: Tue Dec 15 2009 - 15:50:52 EST


On Tue, Dec 15, 2009 at 10:55 AM, Bjorn Helgaas <bjorn.helgaas@xxxxxx> wrote:
> On Monday 14 December 2009 06:42:11 pm Yinghai Lu wrote:
>> Robert Hancock wrote:
>> > Something else isn't quite right. It looks like MMCONFIG area should be
>> > reserved:
>> >
>> > [    0.308434] system 00:0c: iomem range 0xe0000000-0xefffffff has been
>> > reserved
>> >
>> > but the code didn't seem to detect that. In fact there doesn't seem to
>> > be any output about whether it was or wasn't reserved, which from the
>> > code it seems there should be.
>> >
>> > Maybe because of that ACPI method execution error?
>>
>> could be sth pnpacpi brokenness?
>
> Robert, I assume you're referring to this from Tvrtko's post
> (http://lkml.org/lkml/2009/12/13/90):
>
> [    0.000000]  BIOS-e820: 00000000dffd0000 - 00000000e0000000 (reserved)
> [    0.000000]  BIOS-e820: 00000000ff700000 - 0000000100000000 (reserved)
> ...
> [    0.250088] PCI: Found AMD Family 10h NB with MMCONFIG support.
> [    0.250091] PCI: MCFG configuration 0: base e0000000 segment 0 buses 0 - 255
> [    0.250092] PCI: Not using MMCONFIG.
> ...
> [    0.253491] ACPI Error (psargs-0359): [ECEN] Namespace lookup failure, AE_NOT_FOUND
> [    0.253495] ACPI Error (psparse-0537): Method parse/execution failed [\] (Node ffffffff81656ab0), AE_NOT_FOUND
> ...
> [    0.308434] system 00:0c: iomem range 0xe0000000-0xefffffff has been reserved
>
> I think we're rejecting MMCONFIG in the early call to
> pci_mmcfg_reject_broken(), when we check only E820 resources, not
> ACPI resources.  And indeed, the 0xe0000000-0xefffffff range is
> not mentioned in E820.  Which output did you expect to see?
>
> I am uncomfortable with this early/late checking and looking at both
> E820 and ACPI.  It just feels hacky and error-prone.  I'm not happy about
> adding Yinghai's special-case "if we found AMD Fam10h, don't check for
> reservations" patch either.

I would expect to see the report of whether it was reserved in ACPI or
not, if the E820 check failed. The early check would reject it, but
the late call (after the ACPI interpreter initializes) would accept it
because it discovers that it's actually reserved in the ACPI
motherboard resources.

>
> I assume that Windows runs on this box without requiring per-machine
> hacks in the kernel.  Linux should be able to do the same, and the fact
> that we can't is telling us we're doing somethign wrong.  We should fix
> whatever's wrong rather than papering over it.

I wouldn't have a problem with the E820 check being removed, since
it's not actually reliable by itself anyway. In fact I'm not sure that
we need any of the reservation checks at all.

The whole reason we have this junk in there is because early on it was
thought that a bunch of problems people were seeing with systems not
working with MMCONFIG turned on were because their MMCONFIG was
broken, and the reservation checks were there to try to weed this out
by making sure the MCFG data pointed to a memory area that was marked
as reserved. Originally it was checking E820 only, which turned out to
be invalid because the firmware specification only told BIOS people to
reserve the space in ACPI motherboard resources, not E820.

Later on it was discovered that most of the problems were because we
did all config-space access using MMCONFIG, including the base access,
and combined with the fact that we don't disable decode on PCI devices
when sizing memory BARs, the BAR location during sizing would overlap
the MMCONFIG space and result in the device sucking up the MMCONFIG
accesses, usually causing a lockup. So it wasn't actually due to any
broken MMCONFIG motherboards at all. This was solved by using MMCONFIG
for extended config space access only, so that when we move the BAR
temporarily during sizing, we're not trying to access the MMCONFIG
region it overlaps (since BAR sizing requires only base access).

(Lesson: yes, BIOSes are broken a lot, but you can't jump to conclusions.)

It would be interesting to know if there are any systems where the
code reports the MCFG area is not reserved in the ACPI motherboard
resources. I would tend to suspect not, because if it wasn't, Windows
would potentially assign devices to that memory area on such boards
and cause things to fail horribly, which presumably isn't happening.
We might be able to just get rid of all that code.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/