the attached softirq-2.4.5-E5 patch (against 2.4.5-ac3) tries to solve all
softirq, tasklet and scheduling latency problems i could identify while
testing TCP latencies over gigabit connections. The list of problems, as
of 2.4.5-ac3:
- the need_resched check in the arch/i386/kernel/entry.S syscall/irq
return code has a race that makes it possible to miss a reschedule for
up to smp_num_cpus*HZ jiffies.
- the softirq check in entry.S has a race as well.
- on x86, APIC interrupts do not trigger do_softirq(). This is especially
problematic with the smptimers patch, which is APIC-irq driven.
- local_bh_disable() blocks the execution of do_softirq(), and it takes
a nondeterministic amount of time after local_bh_enable() for the next
do_softirq() to be triggered.
- do_softirq() does not execute softirqs that got activated meanwhile,
and the next do_softirq() run happens after a nondeterministic amount
of time.
- the tasklet design re-enables their driving softirq occasionally, which
makes 'complete' softirq processing impossible.
the patch (tries to) solve all these problems. The changes:
- all softirqs are guaranteed to be handled after do_softirq() returns
(even those which are activated during softirq run)
- softirq handling is immediately restarted if bhs are re-enabled again.
- the tasklet code got rewritten (but externally visible semantics are
kept) to not rely on marking the softirq busy. The new code is a bit
tricky, but it should be correct.
- some code got a bit slower, some code got a bit faster. I believe most
of the changes made the softirq/tasklet implementation clearer.
- some minor uninlining of too big inline functions, and other cleanup
was done as well.
- no global serialization was added to any part of the softirq or tasklet
code, so scalability is not impacted.
the patch is stable under every workload i tried, handles softirqs and
tasklets with the minimum possible latency, thus it maximizes cache
locality. The patch has no known bug, and the kernel has no known
lost-wakeup, lost-softirq problem i know of. TCP latencies and TCP
throughput is picture-perfect.
Comments?
Ingo
This archive was generated by hypermail 2b29 : Thu May 31 2001 - 21:00:40 EST