[announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5

From: Ingo Molnar (mingo@elte.hu)
Date: Mon Oct 01 2001 - 17:16:06 EST


to sum things up, we have three main problem areas that are connected to
hardirq and softirq processing:

- a little utility written by Simon Kirby proved that no matter how much
  softirq throttling, it's easy to lock up a pretty powerful Linux
  box via a high rate of network interrupts, from relatively low-powered
  clients as well. 2.4.6, 2.4.7, 2.4.10 all lock up. Alexey said it as
  well that it's still easy to lock up low-powered Linux routers via more
  or less normal traffic.

- prior 2.4.7 we used to 'leak' softirq handling => we ended up missing
  softirqs in a number of circumstances. Stock 2.4.10 still has a number
  of places that do this too.

- a number of people have reported gigabit performance problems (some
  people reported a 10-20% drop in performance under load) since
  ksoftirqd was added - which was added to fix some of the 2.4.6-
  softirq-handling latency problems.

we also have another problem that often pops up when the BIOS goes bad or
a device driver does some mistake:

- Linux often 'locks up' if it gets into a 'interrupt storm' - when
  interrupt sources that send a very high rate of interrupts. This can be
  seen as boot-time hangs and module-insert-time hangs as well.

the attached patch, while a bit radical, is i believe a robust solution to
all four problems. It gives gigabit performance back, avoids the lockups
and attempts to reach as short softirq-processing latency as possible.

the new mechanizm:

- the irq handling code has been extended to support 'soft mitigation',
  ie. to mitigate the rate of hardware interrupts, without support from
  the actual hardware. There is a reasonable default, but the value can
  also be decreased/increased on a per-irq basis via /proc/irq/NR/max_rate.

the method is the following. We count the number of interrupts serviced,
and if within a jiffy there are more than max_rate interrupts, the code
disables the IRQ source and marks it as IRQ_MITIGATED. On the next timer
interrupt the irq_rate_check() function is called, which makes sure that
'blocked' irqs are restarted & handled properly. The interrupt is disabled
in the interrupt controller, which has the nice side-effect of fixing and
blocking interrupt storms. (The support code for 'soft mitigation' is
designed to be very lightweight, it's a decrement and a test in the IRQ
handling hot path.)

(note that in case of shared interrupts, another 'innocent' device might
stay disabled for some short amount of time as well - but this is not an
issue because this mitigation does not make that device inoperable, it
just delays its interrupt by up to 10 msecs. Plus, modern systems have
properly distributed interrupts.)

- softirq code got simplified significantly. The concept is to 'handle all
  pending softirqs' - just as the hardware IRQ code 'handles all hardware
  interrupts that were passed to it'. Since most of the time there is a
  direct relationship between softirq work and hardirq work, the
  mitigation of hardirqs mitigates softirq load as well.

- ksoftirqd is gone, there is never any softirq pending while
  softirq-unaware code is executing.

- the tasklet code needed some cleanup along the way, and it also won some
  restart-on-enable and restart-on-unlock properties that it lacked
  before. (but which is desired.)

due to these changes, the linecount in softirq.c got smaller by 25%.
[i dropped the unwakeup change - but that one could be useful in the VM,
to eg. unwakeup bdflush or kswapd.]

- drivers can optionally use the set_irq_rate(irq, new_rate) call to
  change the current IRQ rate. Drivers are the ones who know best what
  kind of loads to expect from the hardware, so they might want to
  influence this value. Also, drivers that implement IRQ mitigation
  themselves in hardware, can effectively disable the soft-mitigation code
  by using a very high rate value.

what is the concept behind all this? Simplicity, and concept. We were
clearly heading in the wrong direction: putting more complexity into the
core softirq code to handle some really extreme and unusual cases. Also,
softirqs were slowly morphing into something process-ish - but in Linux we
already have a concept of processes, so we'd have two dualling concepts.
(We still have tasklets, which are not really processes - they are
single-threaded paths of execution.)

with this patch, softirqs can again be what they should be: lightweight
'interrupt code' that processes hard-IRQ events but still does this with
interrupts enabled, to allow for low hard-IRQ latencies. Anything that is
conceptually heavyweight IMO does not belong into softirqs, it should be
moved into process contexts. That will take care of CPU-time usage
accounting and CPU-time-limiting and priority issues as well.

(the patch also imports the latency and softirq-restart fixes from my
previous softirq patches.)

i've tested the patch on both UP, SMP, XT-PIC and APIC systems, it
correctly limits network interrupt rates (and other device interrupt
rates) to the given limit. I've done stress-testing as well. The patch is
against 2.4.11-pre1, but it applies just fine to the -ac tree as well.

with a high irq-rate limit set, ping flooding has this effect on the
test-system:

 [root@mars /root]# vmstat 1
    procs memory swap io
  r b w swpd free buff cache si so bi bo in
  0 0 0 0 877024 1140 11364 0 0 12 0 30960
  0 0 0 0 877024 1140 11364 0 0 0 0 30950
  0 0 0 0 877024 1140 11364 0 0 0 0 30520

ie. 30k interrupts/sec. With the max_rate set to 1000 interrupts/sec:

 [root@mars /root]# echo 1000 > /proc/irq/21/max_rate
 [root@mars /root]# vmstat 1
    procs memory swap io
  r b w swpd free buff cache si so bi bo in
  0 0 0 0 877004 1144 11372 0 0 0 0 1112
  0 0 0 0 877004 1144 11372 0 0 0 0 1111
  0 0 0 0 877004 1144 11372 0 0 0 0 1111

so it works just fine here. Interactive tasks are still snappy over the
same interface.

Comments, reports, suggestions and testing feedback is more than welcome,

        Ingo



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Oct 07 2001 - 21:00:17 EST