Re: [PATCH] pci: increase alignment to make more space for hiddencode

From: Ingo Molnar
Date: Tue Oct 13 2009 - 02:50:46 EST



* Bjorn Helgaas <bjorn.helgaas@xxxxxx> wrote:

> We've established that the bridge and the NIC are handed off from BIOS
> like this:
>
> pci 0000:00:1c.4: bridge io port: [0x3000-0x3fff]
> pci 0000:00:1c.4: bridge 32bit mmio: [0xf4500000-0xf45fffff]
> pci 0000:07:00.0: reg 10 64bit mmio: [0xf4500000-0xf4503fff]
> pci 0000:07:00.0: reg 18 io port: [0x3000-0x30ff]
>
> Unless we boot with "acpi=off", this configuration is lost, and by the
> time we discover them, they look like this:
>
> pci 0000:00:1c.4: bridge io port: [0x00-0xfff]
> pci 0000:00:1c.4: bridge 32bit mmio: [0x000000-0x0fffff]
> pci 0000:00:1c.4: bridge 64bit mmio pref: [0x000000-0x0fffff]
> pci 0000:07:00.0: reg 10 64bit mmio: [0x000000-0x003fff]
> pci 0000:07:00.0: reg 18 io port: [0x00-0xff]
>
> Mystery #1 is why this configuration gets lost, and whether this is
> telling us about a Linux defect. We might get a clue about this if we
> could see what resources the NIC uses under Windows. If it uses the
> handoff range (0xf4500000-0xf4503fff), it's likely that Windows
> managed to keep the BIOS-programmed resources, and Linux is doing
> something wrong. If it uses some other range, then Windows likely had
> to reconfigure the device just like Linux does.

I can see two possibilities here, on the Linux side:

- AML: if there's an ACPI table with an AML script in it, with some BIOS
provided vendor quirk that reprograms those BARs, that would explain
why acpi=off makes the side-effect go away. ACPI does not touch BARs
except if told by the firmware.

- The other possibility would be for there to be some ACPI table driven
Linux PCI/driver/chipset quirk somewhere. With acpi=off that quirk
does not get executed.

> Mystery #2 is why, even with the lost configuration, 2.6.30 configures
> the NIC so it works, but 2.6.31 does not. In 2.6.30, we put the NIC
> in the [0xb8000000-0xb80fffff] range, and in 2.6.31, we put it in
> [0xb6000000-0xb60fffff]. I'd really like to know what the host bridge
> _CRS says. It's possible that we're only supposed to use the range
> above 0xb8000000. If that's the case, the fact that we're ignoring
> the _CRS would be another Linux defect.

Another theory would be just pure luck: the device might have a BAR
address constraint (which the BIOS knows about but doesnt tell us), and
2.6.30 gets it right accidentally while 2.6.31 violates the constraint.

> In the patch below, I added some extra PCI dumps of the bridge and the
> NIC around the ACPI EC init. The patch also removes Yinghai's
> workaround so we should see the original failure, just with a little
> more debug.

Btw., i'd _strongly_ suggest to finally add some sort of pci=verbose
easy-to-use debug toggle for users to enable.

Everything that matters to resource allocation. We should print the BIOS
state (Yinghai did a patch for this some time ago and that is upstream
already), we should print quirk execution, we should print ACPI AML
execution - everything that might matter to PCI allocations.

An easy-to-use 'give me all the debug info' feature is really important.
We have apic=verbose for similar reasons.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/