[patch] softirq-2.4.5-E5

From: Ingo Molnar (mingo@elte.hu)
Date: Tue May 29 2001 - 12:49:57 EST


the attached softirq-2.4.5-E5 patch (against 2.4.5-ac3) tries to solve all
softirq, tasklet and scheduling latency problems i could identify while
testing TCP latencies over gigabit connections. The list of problems, as
of 2.4.5-ac3:

 - the need_resched check in the arch/i386/kernel/entry.S syscall/irq
   return code has a race that makes it possible to miss a reschedule for
   up to smp_num_cpus*HZ jiffies.

 - the softirq check in entry.S has a race as well.

 - on x86, APIC interrupts do not trigger do_softirq(). This is especially
   problematic with the smptimers patch, which is APIC-irq driven.

 - local_bh_disable() blocks the execution of do_softirq(), and it takes
   a nondeterministic amount of time after local_bh_enable() for the next
   do_softirq() to be triggered.

 - do_softirq() does not execute softirqs that got activated meanwhile,
   and the next do_softirq() run happens after a nondeterministic amount
   of time.

 - the tasklet design re-enables their driving softirq occasionally, which
   makes 'complete' softirq processing impossible.

the patch (tries to) solve all these problems. The changes:

 - all softirqs are guaranteed to be handled after do_softirq() returns
   (even those which are activated during softirq run)

 - softirq handling is immediately restarted if bhs are re-enabled again.

 - the tasklet code got rewritten (but externally visible semantics are
   kept) to not rely on marking the softirq busy. The new code is a bit
   tricky, but it should be correct.

 - some code got a bit slower, some code got a bit faster. I believe most
   of the changes made the softirq/tasklet implementation clearer.

 - some minor uninlining of too big inline functions, and other cleanup
   was done as well.

 - no global serialization was added to any part of the softirq or tasklet
   code, so scalability is not impacted.

the patch is stable under every workload i tried, handles softirqs and
tasklets with the minimum possible latency, thus it maximizes cache
locality. The patch has no known bug, and the kernel has no known
lost-wakeup, lost-softirq problem i know of. TCP latencies and TCP
throughput is picture-perfect.

Comments?

        Ingo



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu May 31 2001 - 21:00:40 EST