Re: [git pull] drm patches for 2.6.27-rc1

From: Ingo Molnar
Date: Sat Oct 18 2008 - 18:32:50 EST



* Keith Packard <keithp@xxxxxxxxxx> wrote:

> On Sat, 2008-10-18 at 22:37 +0200, Ingo Molnar wrote:
>
> > But i think the direction of the new GEM code is subtly wrong here,
> > because it tries to manage memory even on 64-bit systems. IMO it
> > should just map the _whole_ graphics aperture (non-cached) and be
> > done with it. There's no faster method at managing pages than the
> > CPU doing a TLB fill from pagetables.
>
> Yeah, we're stuck thinking that we "can't" map the aperture because
> it's too large, but with a 64-bit kernel, we should be able to keep it
> mapped permanently.
>
> Of course, the io_reserve_pci_resource and io_map_atomic functions
> could do precisely that, as kmap_atomic does on non-HIGHMEM systems
> today.

okay, so basically what we need is a shared API that does per page
kmap_atomic on 32-bit, and just an ioremap() on 64-bit. I had the
impression that you were suggesting to extend kmap_atomic() to 64-bit -
which would be wrong.

So, in terms of the 4 APIs you suggest:

struct io_mapping *io_reserve_pci_resource(struct pci_dev *dev,
int bar,
int prot);
void io_mapping_free(struct io_mapping *mapping);

void *io_map_atomic(struct io_mapping *mapping, unsigned long pfn);
void io_unmap_atomic(struct io_mapping *mapping, unsigned long pfn);

here is what we'd do on 64-bit:

- io_reserve_pci_resource() would just do an ioremap(), and would save
the ioremap-ed memory into struct io_mapping.

- io_mapping_free() does the iounmap()

- io_map_atomic(): just arithmetics, returns mapping->base + pfn - no
TLB activities at all.

- io_unmap_atomic(): NOP.

it's as fast as it gets: zero overhead in essence. Note that it's also
shared between all CPUs and there's no aliasing trouble.

And we could make it even faster: if you think we could even use 2MB
TLBs for the _linear_ ioremap()s here, hm? There's plenty of address
space on 64-bit so we can align to 2MB just fine - and aperture sizes
are 2MB sized anyway.

Or we could go one step further and install these aperture mappings into
the _kernel linear_ address space. That would be even faster, because
we'd have a constant offset. We have the (2MB mappings aware) mechanism
for that already. (Yinghai Cc:-ed - he did a lot of great work to
generalize this area.)

(In fact if we installed it into the linear kernel address space, and if
the aperture is 1GB aligned, we will automatically use gbpages for it.
Were Intel to support gbpages in the future ;-)

the _real_ remapping in a graphics aperture happens on the GPU level
anyway, you manage an in-RAM GPU pagetable that just works like an
IOMMU, correct?

on 32-bit we'd have what you use in the GEM code today:

- io_reserve_pci_resource(): a NOP in essence

- io_mapping_free(): a NOP

- io_map_atomic(): does a kmap_atomic(pfn)

- io_unmap_atomic(): does a kunmap_atomic(pfn)

so on 32-bit we have the INVLPG TLB overhead and preemption restrictions
- but we knew that. We'd have to allow atomic_kmap() on non-highmem as
well but that's fair.

Mind sending patches for this? :-)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/