On Sat, 26 May 2001, David S. Miller wrote:
> And looking at the x86 code, I don't even understand how your fixes
> can make a difference, what about the do_softirq() call in
> arch/i386/kernel/irq.c:do_IRQ()??? [...]
[you are right, it's a brain fart on my part. doh. i guess i was too happy
having fixed the longstanding latency problem.]
the TCP latency issues and the missed softirq execution bug is still
there, but for a slightly different reason.
the bug/misbehavior causing bad latencies turned out to be the following:
if a hardirq triggers a softirq, but syscall-level code on the same CPU
disabled local bhs via local_bh_disable(), then we 'miss' the execution of
the softirq, until the next IRQ. (or next direct call to do_softirq()).
the attached softirq-2.4.5-B0 patch fixes this problem by calling
do_softirq() from local_bh_enable() [if the bh count is 0, to avoid
recursion]. This slightly changes local_bh_enable() semantics: calling
do_softirq() has the side-effect of disabling/enabling interrupts, so code
that used local_bh_enable while interrupts are disabled (and depended on
them staying disabled) will break. I checked all code that uses
local_bh_enable() via a debugging check, and the only (harmless) violation
of this new rule is machine_restart() in the x86 tree.
Yesterday's patches fix this problem too, but only as a lucky side-effect,
and only in the idle-poll case. 2.4.5 + softirq-2.4.5-B0 TCP latency is
down from a fluctuating 300-400 microseconds to a stable 109 microseconds.
Ingo
This archive was generated by hypermail 2b29 : Thu May 31 2001 - 21:00:33 EST