Re: PCI resource problems caused by improper address rounding

From: Linus Torvalds
Date: Tue Dec 18 2007 - 16:11:56 EST

Next message: Rik van Riel: "[patch 01/20] convert anon_vma list lock a read/write lock"
Previous message: Roland Dreier: "Re: [RFC] dma: passing "attributes" to dma_map_* routines"
In reply to: Richard Henderson: "Re: PCI resource problems caused by improper address rounding"
Next in thread: Chuck Ebbert: "Re: PCI resource problems caused by improper address rounding"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, 18 Dec 2007, Richard Henderson wrote:
>
> I've added dmesg, /proc/iomem, and lspci -v output to that bug.
>
> Basically, we have
>
> c0000000-cfffffff : free
> ddf00000-dfefffff : PCI Bus #04
> e0000000-efffffff : pnp 00:0b
> f0000000-fedfffff : less than 256MB

Gaah.

That really is very unlucky. That 256M only goes at one point in the low
4GB, but the thing is, it fits perfectly well above it, and dammit, that
resource is explicitly a 64-bit resource or a really good reason.

However, I wonder about that

e0000000-efffffff : pnp 00:0b

thing. I actually suspect that that whole allocation is literally *meant*
for that 256MB graphics aperture, but the kernel explicitly avoids it
because it's listed in the PnP tables.

I wonder what the heck is the point of that pnp entry. Just for fun, can
you try to just disable CONFIG_PNP, and see if it all works then?

Björn Helgaas added to Cc to clarify what those pnp entries tend to mean,
and whether there is possibly some way to match up a specific pnp entry
with the PCI device that might want to use it. Because that is a nice
256MB region that really doesn't seem to make sense for anything else than
the graphics buffer - there's nothing else in your system that seems
likely (although I guess it could be for some docking port, but even then
I'd have expected one of the PCI bridges to map it!)

But apart from the question about that pnp 00:0b device, the kernel
resource allocation really does look perfectly fine, and while we could
shoe-horn it into the low 4GB in this case by just hoping that there is
nothing undocumented there (and there probably isn't), it's really
annoying considering that big graphics areas are a hell of a good reason
to use those 64-bit resources.

It's not like 256MB is even as large as they come, half-gig graphics cards
are getting to be fairly common at the high end, and X absolutely _has_ to
be able to handle a 64-bit address for those.

Also, I'm surprised it doesn't work with X already: the ChangeLog for X
says that there are "Minor fixes to the handling of 64-bit PCI BARs [..]"
in 4.6.99.18, so I'd have assumed that XFree86-4.7.0 should be able to
handle this perfectly well.

I'll add Keithp to the cc too, to see if the X issues can be clarified.
Maybe he can set us right. But maybe you just have an old X server? If so,
considering the situation, I really think the kernel has done a good job
already, and I'd be *very* nervous about making the kernel allocate new
PCI resources right after the end-of-memory thing.

I bet it would work in this case, but as mentioned, we definitely know of
cases where the BIOS did *not* document the magic memory region that was
stolen for UMA graphics, and trying to put PCI devices just after the top
of reserved memory in the e820 list causes machines to not work at all
because the address decoding will clash.

Of course, we could also make the minimum address more of a *hint*, and
only make the resource allocator only abut the top-of-known-memory when it
absolutely has to, but on the other hand, in this case it really doesn't
have to, since there's just _tons_ of space for 64-bit resources. So the
correct thing really does seem to be to just use the 64-bit hw that is
there.

> That would have been an excellent comment to add to that code then,
> rather than just "rounding up to the next 1MB area", because purely
> as rounding code it is erroneous.

Patches to add comments are welcome. There are few enough people who
actually work on the PCI resource allocation code these days (I wish there
were more), and it's very rare that anybody else than me or Ivan ends up
even *looking* at it. So it's not been a big issue.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Rik van Riel: "[patch 01/20] convert anon_vma list lock a read/write lock"
Previous message: Roland Dreier: "Re: [RFC] dma: passing "attributes" to dma_map_* routines"
In reply to: Richard Henderson: "Re: PCI resource problems caused by improper address rounding"
Next in thread: Chuck Ebbert: "Re: PCI resource problems caused by improper address rounding"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]