Re: Multicast delays and high iowait
From: H. Willstrand
Date: Tue Apr 01 2008 - 16:24:30 EST
On Tue, Apr 1, 2008 at 6:05 PM, Matt Garman <matthew.garman@xxxxxxxxx> wrote:
> We're using multicast basically for some inter-processs
> communication.
>
Which protocol(-s) are in use? (UDP, IGMP, ...)
> We timestamp (and log, in a separate thread) all of our sends and
> receives, and do analysis on the logs.
>
Are timestamps sent in the broadcast? If so, can the timestamps be out
of sync generating the "delays"?
> We're finding occassional (once or twice a day) "blips" where the
> receipt of multicast messages is delayed anywhere from 200
> milliseconds to three or four whole seconds.
>
> In one case, we have only one server in the network, and are still
> seeing this. In this scenario, do the multicast messages actually
> use the physical network?
>
> I'm running sar on these machines (collecting data every five
> seconds); any delay >600 ms seems to conincide with extremely high
> iowait (but the load on any CPU during these times is always below
> 1.0).
>
> We have the sysctl net.core.rmem_max parameter set to 33554432.
>
> Our code uses setsockopt() to set the recieving buffer to the
> maximum size allowed by the kernel (i.e. 33554432 in our case).
>
> The servers are generally lightly loaded: typically they have a load
> of <1.0, and rarely does the load exceed 3.0---yet the servers have
> eight physical cores.
>
> This is with kernel 2.6.9-42.ELsmp, i.e. the default for CentOS 4.4.
>
> This doesn't appear to be a CPU problem. I wrote a simple multicast
> testing program. It sends a constant stream of messages, and, in a
> separate thread, logs the time of each send. I wrote a
> corresponding receive program (logs receive times in a separate
> thread). Running eight instances of cpuburn, I can't generate any
> significant delays. However, if I run something like
>
> dd bs=1024000 if=/dev/zero of=zeros.dat count=12288
>
> I can create multicast delays over one second. This will also
> generate high iowait in the sar log. However, in actual production
> use, no process should ever push the disk as hard as that "dd" test.
> (In other words, while I can duplicate the problem, I'm not sure
> it's a fair test).
>
> Any ideas or suggestions would be much appreciated. I don't really
> know enough about the kernel's network architecture to devise any
> more tests or know how else I might be able to pinpoint the cause of
> this problem.
>
> Thank you,
> Matt
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-net" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html