From: raz ben yehuda
Date: Fri Aug 28 2009 - 11:22:52 EST

On Fri, 2009-08-28 at 09:25 -0400, Rik van Riel wrote:
> raz ben yehuda wrote:
> > yes. latency is a crucial property.
> In the case of network packets, wouldn't you get a lower
> latency by transmitting the packet from the CPU that
> knows the packet should be transmitted, instead of sending
> an IPI to another CPU and waiting for that CPU to do the
> work?
Hello Rik
If I understand what you are saying, you say that I pass 1.5K packets to
a offline CPU ?
If so, then this is not what I do, because you are very right, it does
not make any sense.
I do not pass packets to an offline cpu , i pass assignments. an
assignment is a buffer with some context of what do with it (like aio)
and a buffer is of ~1MB. Also, the offline processor holds the network
interface as it own interface. No two offline processors transmit over a
single interface.( I modified the bonding driver to work with offline
processor for that ). I am aware of network queue per processors, but
benchmarks proved this was better.( I do not have these benchmarks
Also these engines do not release any sk_buffs to the operating system,
these packets are being reused over and over to reduce latency of
allocating memory and cache misses.
Also, in some cases I disabled the transmit interrupts and I released
packets ( --skb->users was still greater than 0, not really release ) in
an offline context.I learned it from the chelsio driver. This way, I
reduced more load from the operating system. It proved to be better in
large 1Gbps arrays and was able to remove atomic_inc atomic_dec in some
variants of the code, atomic operations cost a lot.
in MSI cards I did not find it the example i showed, i use MSI
and system is almost idle.
Also, as I recall , IPI will not pass to an offladed processor. offsced
it runs NMI.
Also, I would to express my apologies if any of this correspondence
seems to be as I am trying to PR offsched. I am not.
> Inter-CPU communication has always been the bottleneck
> when it comes to SMP performance. Why does adding more
> inter-CPU communication make your system faster, instead
> of slower like one would expect?

