Re: [PATCH] dst numa: Avoid dst counter cacheline bouncing

From: David S. Miller
Date: Thu Jun 23 2005 - 22:33:32 EST

From: Christoph Lameter <christoph@xxxxxxxxxxx>
Date: Thu, 23 Jun 2005 20:10:06 -0700 (PDT)

> AIM7 results (tcp_test and udp_test) on i386 (IBM x445 8p 16G):
> No patch 94788.2
> w/patch 97739.2 (+3.11%)
> The numbers will be much higher on larger NUMA machines.

How much higher? I don't believe it. And %3 doesn't justify the
complexity (both in time and memory usage) added by this patch.

Performance of our stack's routing cache is _DEEPLY_ tied to the
size of the routing cache entries, of which dst entry is a member.
Every single byte counts.

You've exploded this to be (NR_NODES * sizeof(void *)) larger. That
is totally unacceptable, even for NUMA.

Secondly, inlining the "for_each_online_node()" loops isn't very nice

Consider making a per-node routing cache instead, just like the flow
cache is per-cpu, or make socket dst entries have a per-node array of
object pointers. Fill the per-node array in lazily, just as you do
for the dst. The first time you try to clone a dst on a cpu for a
socket, create the per-cpu entry slot.

We don't need to make them per-node system wide, only per-socket is
this really needed.

This way you do per-node walking when you detach the dst from the
socket at close() time, not at every dst_release() call, and thus for
every packet in the system.

In light of that, I don't see what the advantage is. Instead of
atomic inc/dec on every packet sent in the system, you walk the whole
array of counters for every packet sent in the system. If you really
get contention amongst nodes for a DST entry, this walk should result
in ReadToShare transactions, and thus cacheline movement, between the
NUMA nodes, on every __kfree_skb() call.

Essentially you're trading 1 atomic inc (ReadToOwn) and 1 atomic dec
(ReadToOwn) per packet for significant extra memory, much bigger code,
and 1 ReadToShare transaction per packet.

And since you're still using atomic_inc/atomic_dec you'll still hit
ReadToOwn transactions within the node.
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at