Re: pci-sysfs resource mmap broken (and PATCH)

From: Benjamin Herrenschmidt
Date: Thu Apr 28 2005 - 02:25:10 EST

> Yes.
> > Except from that page alignment thing ... which is the root of the
> > problem.
> You can hide all of those problems in libpci.a or whatever.
> You page align the offset, but pass back to the user a pointer
> with his sub-page offset applied to it.
> The kernel wants pages, so just give it pages :-)

Oh, yes, I agree, but I don't want libpci to contain too much arch
specific code, so we must may sure that libpci has the mean of "knowing"
how to deal with that. When you mmap "resource0", you can't know that
there is an alignement/offset issue and that what you'll be getting is
actually not pointing to your actual resource but somewhere ... below.

In order to know that, libpci need to know the actual physical pointer
that was used by the kernel for the mmap before the page alignment so it
can figure out what offset to apply. So that is why I'm saying that
whatever we expose in "resources" and/or "/proc/bus/pci/devices" should
at be a CPU physical address, or at least contain the low PAGE_SHIFT
bits of it... even if the rest is a token...

> > Cleaning is my goal, this is why I'm trying to clean the kernel
> > side first :) I'm also working separately on the problem of VGA access
> > arbitration (We'll probably do a joint session with the desktop summit
> > an the kernel summit about those issue).
> Yeah, that one is all about enabling VGA forwarding in the bridges.

Oh, not only that ... also playing ping pong when several cards are
there, tracking who is having the ownership, but also allowing the
driver who can disable legacy decodign on the card to inform the arbiter
so that card stop beeing bothered and can keep interrupts enabled at all
time without bothering etc.. etc.. etc.. that includes fixing the kernel
vga console to deal with an arbiter, some fbdev stuffs as well, and
more.. an finally adapting userland things like X to use it.

The actual arbiter core itself is easy. Making things play nice with it
is the difficult part. I'll come up with various solutions that I wnat
to experience with before KS/DS and will present my results there so we
can decide what to do.

> Taking out all of the resource garbage in the X server, and replacing
> it with properly synchronized calls in the kernel for mapping ROMs
> and changing the current VGA forwarding seems to be the way to go.

It is, and the people do agree, provided there is no loss in

> Some scsi controllers have prefetchable set in their normal
> register BARs. The sym53c8xx does if I remember correctly.

Hrm... do we have to worry about somebody in userland mmap'ing it's
register via /sys/*pci and breaking because of that ? :) I have a real
net big performance improvement on X by doing that trick ... the sysfs
mmap API doesn't really provide a mean to do it explicitely from
userland (unlike the ioctl with the old proc api)

> Anyways, what I'm trying to say is that blinding turning prefetchable
> BAR into "don't set side effect bit in PTE" might not be so wise.

Well, I'm still trying to find a case that would break. I'm not doing
that for in-kernel mapping. Oh well, it's done anyway, I'll see what
reports I'm getting from it...

> I really think it's a userlevel decision. That's where all the ioctl()
> garbage came from for the /proc/bus/pci mmap() stuff. It was for chossing
> IO vs MEM space, and also for setting these kinds of mapping attributes.

I know. I just don't see how to replace it with sysfs ... maybe mmap
pgprot flags...


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at