Re: [block IO crash] Re: 2.6.39-rc5-git2 boot crashs

From: Ingo Molnar
Date: Thu May 05 2011 - 06:19:05 EST



* Tejun Heo <tj@xxxxxxxxxx> wrote:

> 2. Make irq toggling as cheap as preemption toggling. This can be
> achieved by implementing IRQ masking in software. I played with it
> a bit on x86 last year. I didn't get to finish it and it would
> have taken me some time to iron out weird failures but it didn't
> seem too difficult and as irq on/off is quite expensive on a lot of
> CPUs, this might bring overall performance benefit.
>
> For many archs, #2 would be the only choice and if we're gonna do that I
> think it would be best to do it on x86 too. It involves changes to common
> code too and x86 has the highest test/development coverage.

We played with this in -rt on and off but note that -rt doesnt do this right
now. Interestingly, most of the irq-disable wrappery and state tracking code
for that is upstream already, via the lockdep irq state tracking patches.
(Surprise! :-)

The disadvantages:

- register pressure increases, the pushf+cli+popf sequence has no register
side-effects, while soft flags inevitably disturb register allocations.
*possibly* quite low with the modern percpu implementation, but this has
to be measured very carefuly, with disassembly.

- icache size increases - the percpu ops are larger than the minimal
pushf+cli+popfl sequence. Again, this too has to be measured both via
vmlinux size analysis and via perf stat --repeat icache pressure runs.

[ This is also an assymetric cost: it increases the cost of the cache-cold
case, while most of the benefits are in the cache-hot case. ]

- irq replay becomes common and there's extra cost due to that. Also we are
not ready to replay some types of irqs (lapic timer), at least with current
code. So there's some ongoing maintenance cost there.

The benefits are:

- lockdep is already tracking irqs on/off sections rather carefully, so we know
all the places that play with irqs and the ongoing maintenance cost is
shared with what we'd have to do with lockdep anyway.

- on Nehalem a "PUSHF; CLI; POPF" sequence is 18 cycles, a soft sequence would
be more like 2 cycles. So we win around 15 cycles per sequence in the fast
path - minus collateral slowpath cost above ... which are not directly
comparable.

- Stock mainline would become a truly hard RT irq handlers kernel which never
ever disables hardirqs. Big wow factor and precision guided, laser mounted
sharks!

I probably missed a few factors, but these are the main concerns.

My firm judgement: "Dunno".

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/