Re: [this_cpu_xx V6 7/7] this_cpu: slub aggressive use of this_cpuoperations in the hotpaths

From: David Rientjes
Date: Thu Oct 15 2009 - 05:04:51 EST


On Wed, 14 Oct 2009, Mel Gorman wrote:

> NETPERF TCP_STREAM
> Packet netperf-tcp tcp-SLUB netperf-tcp tcp-SLAB
> Size SLUB-vanilla this-cpu SLAB-vanilla this-cpu
> 64 1773.00 ( 0.00%) 1731.63 (-2.39%)* 1794.48 ( 1.20%) 2029.46 (12.64%)
> 1.00% 2.43% 1.00% 1.00%
> 128 3181.12 ( 0.00%) 3471.22 ( 8.36%) 3296.37 ( 3.50%) 3251.33 ( 2.16%)
> 256 4794.35 ( 0.00%) 4797.38 ( 0.06%) 4912.99 ( 2.41%) 4846.86 ( 1.08%)
> 1024 9438.10 ( 0.00%) 8681.05 (-8.72%)* 8270.58 (-14.12%) 8268.85 (-14.14%)
> 1.00% 7.31% 1.00% 1.00%
> 2048 9196.06 ( 0.00%) 9375.72 ( 1.92%) 11474.59 (19.86%) 9420.01 ( 2.38%)
> 3312 10338.49 ( 0.00%)* 10021.82 (-3.16%)* 12018.72 (13.98%)* 12069.28 (14.34%)*
> 9.49% 6.36% 1.21% 2.12%
> 4096 9931.20 ( 0.00%)* 10285.38 ( 3.44%)* 12265.59 (19.03%)* 10175.33 ( 2.40%)*
> 1.31% 1.38% 9.97% 8.33%
> 6144 12775.08 ( 0.00%)* 10559.63 (-20.98%) 13139.34 ( 2.77%) 13210.79 ( 3.30%)*
> 1.45% 1.00% 1.00% 2.99%
> 8192 10933.93 ( 0.00%)* 10534.41 (-3.79%)* 10876.42 (-0.53%)* 10738.25 (-1.82%)*
> 14.29% 2.10% 12.50% 9.55%
> 10240 12868.58 ( 0.00%) 12991.65 ( 0.95%) 10892.20 (-18.14%) 13106.01 ( 1.81%)
> 12288 11854.97 ( 0.00%) 12122.34 ( 2.21%)* 12129.79 ( 2.27%)* 12411.84 ( 4.49%)*
> 1.00% 6.61% 5.78% 8.95%
> 14336 12552.48 ( 0.00%)* 12501.71 (-0.41%)* 12274.54 (-2.26%) 12322.63 (-1.87%)*
> 6.05% 2.58% 1.00% 2.23%
> 16384 11733.09 ( 0.00%)* 12735.05 ( 7.87%)* 13195.68 (11.08%)* 14401.62 (18.53%)
> 1.14% 9.79% 10.30% 1.00%
>
> The results for the patches are a bit all over the place for TCP_STREAM
> with big gains and losses depending on the packet size, particularly 6144
> for some reason. SLUB vs SLAB shows SLAB often has really massive advantages
> and this is not always for the larger packet sizes where the page allocator
> might be a suspect.
>

TCP_STREAM stresses a few specific caches:

ALLOC_FASTPATH ALLOC_SLOWPATH FREE_FASTPATH FREE_SLOWPATH
kmalloc-256 3868530 3450592 95628 7223491
kmalloc-1024 2440434 429 2430825 10034
kmalloc-4096 3860625 1036723 85571 4811779

This demonstrates that freeing to full (or partial) slabs causes a lot of
pain since the fastpath normally can't be utilized and that's probably
beyond the scope of this patchset.

It's also different from the cpu slab thrashing issue I identified with
the TCP_RR benchmark and had a patchset to somewhat improve. The
criticism was the addition of an increment to a fastpath counter in struct
kmem_cache_cpu which could probably now be much cheaper with these
optimizations.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/