Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because ofvirt_to_slab(objp); nodeid = slabp->nodeid;

From: Eric Dumazet
Date: Tue Mar 20 2007 - 18:10:05 EST


Andi Kleen a écrit :
Is it possible virt_to_slab(objp)->nodeid being different from pfn_to_nid(objp) ?
It is possible the page allocator falls back to another node than requested. We would need to check that this never occurs.

The only way to ensure that would be to set a strict mempolicy.
But I'm not sure that's a good idea -- after all you don't want
to fail an allocation in this case.

But pfn_to_nid on the object like proposed by Eric should work anyways.
But I'm not sure the tables used for that will be more often cache hot
than the slab.

pfn_to_nid() on most x86_64 machines access one cache line (struct memnode).

Node 0 MemBase 0000000000000000 Limit 0000000280000000
Node 1 MemBase 0000000280000000 Limit 0000000480000000
NUMA: Using 31 for the hash shift.

On this example, we use only 8 bytes of memnode.embedded_map[] to find nid of all 16 GB of ram. On profiles I have, memnode is always hot (no cache miss on it).

While virt_to_slab() has to access :

1) struct page -> page_get_slab() (page->lru.prev) (one cache miss)
2) struct slab -> nodeid (one other cache miss)


So using pfn_to_nid() would avoid 2 cache misses.

I understand we want to do special things (fallback and such tricks) at allocation time, but I believe that we can just trust the real nid of memory at free time.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/