My theory is that it is caused by better cache line usage. In the bulk
transfer case most packets have the same size (device MTU), and then
the cache wasn't effectively used. slab fixes that. Also the other
sk_buff code has been simplified which should speed it up too.
It probably depends on the CPU and the L1 cache organisation.
It would be possible to implement cache colouring in the old skbuff code,
but I would prefer not to because the new code is much nicer, and I think
duplicating mechanisms that are already present elsewhere should be avoided
if possible. The slabified version also does preinitialize some state
which would be not possible to implement with the old code.
This is only a theory, it would be interesting to compare the two
implementations with some of the Pentium cycle/cache miss counters turned
on. Any takers? @)
On a P90 the localhost TCP numbers increased from 11MB/s to 11.55MB/s
(and they got more stable, with 10 tests without cache colouring the numbers
varied a lot, while with it they were nearly constant - this shows that it has
a effect)
In the fast router case with lots of busmastering IO going on the improvements
were a lot more dramatic.
-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html