Re: Multicast delays and high iowait
From: Bill Fink
Date: Wed Apr 02 2008 - 02:04:38 EST
On Tue, 1 Apr 2008, Matt Garman wrote:
> We're using multicast basically for some inter-processs
> communication.
>
> We timestamp (and log, in a separate thread) all of our sends and
> receives, and do analysis on the logs.
>
> We're finding occassional (once or twice a day) "blips" where the
> receipt of multicast messages is delayed anywhere from 200
> milliseconds to three or four whole seconds.
>
> In one case, we have only one server in the network, and are still
> seeing this. In this scenario, do the multicast messages actually
> use the physical network?
>
> I'm running sar on these machines (collecting data every five
> seconds); any delay >600 ms seems to conincide with extremely high
> iowait (but the load on any CPU during these times is always below
> 1.0).
>
> We have the sysctl net.core.rmem_max parameter set to 33554432.
>
> Our code uses setsockopt() to set the recieving buffer to the
> maximum size allowed by the kernel (i.e. 33554432 in our case).
>
> The servers are generally lightly loaded: typically they have a load
> of <1.0, and rarely does the load exceed 3.0---yet the servers have
> eight physical cores.
>
> This is with kernel 2.6.9-42.ELsmp, i.e. the default for CentOS 4.4.
>
> This doesn't appear to be a CPU problem. I wrote a simple multicast
> testing program. It sends a constant stream of messages, and, in a
> separate thread, logs the time of each send. I wrote a
> corresponding receive program (logs receive times in a separate
> thread). Running eight instances of cpuburn, I can't generate any
> significant delays. However, if I run something like
>
> dd bs=1024000 if=/dev/zero of=zeros.dat count=12288
>
> I can create multicast delays over one second. This will also
> generate high iowait in the sar log. However, in actual production
> use, no process should ever push the disk as hard as that "dd" test.
> (In other words, while I can duplicate the problem, I'm not sure
> it's a fair test).
>
> Any ideas or suggestions would be much appreciated. I don't really
> know enough about the kernel's network architecture to devise any
> more tests or know how else I might be able to pinpoint the cause of
> this problem.
Hi Matt,
One thing you could try is to set the CPU affinity of your client/server
and the NIC interrupts to one CPU and the disk interrupts to a different
CPU.
On my network test systems, I actually set the CPU affinity of all
the normal system processes to CPU 1, by adding the following at the
beginning of the /etc/rc.sysinit script (this is tailored for my
dual CPU servers so the "2" CPU mask reflects my particular CPU
configuration):
taskset -p 2 1
taskset -p 2 $$
Then at the end of the /etc/rc.local script, I add:
taskset -p 1 `ps ax | grep xinetd | grep -v grep | awk '{ print $1 }'`
which causes xinetd and any servers it spawns to run on CPU 0.
I also have in the /etc/rc.local script:
echo 1 >> /proc/irq/`grep eth2 /proc/interrupts | awk '{ print $1 }' | sed 's/://'`/smp_affinity
This forces the NIC interrupts for the 10-GigE NIC (eth2) to be handled
by CPU 0.
There are no other active network interfaces on these servers, or I
would move their interrupts to CPU 1. And you might want to do likewise
for the disk interrupts (I may wind up doing this myself).
Finally run the client/server command on the same CPU as the NIC
interrupts, e.g. in the above scenario you could run the client by:
taskset 1 client [arguments]
Note the taskset command has very non-intuitive command structure
(at least to me), so consult the man page.
-Bill
--
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html