Re: >10% performance degradation since 2.6.18

From: Daniel J Blueman
Date: Tue Jul 07 2009 - 18:06:16 EST


On Mon, Jul 6, 2009 at 10:58 PM, <Chetan.Loke@xxxxxxxxxx> wrote:
>> -----Original Message-----
>> From: linux-kernel-owner@xxxxxxxxxxxxxxx
>> [mailto:linux-kernel-owner@xxxxxxxxxxxxxxx] On Behalf Of
>> Daniel J Blueman
>> Sent: Sunday, July 05, 2009 7:01 AM
>> To: Matthew Wilcox; Andi Kleen
>> Cc: Linux Kernel; Jens Axboe; Arjan van de Ven
>> Subject: Re: >10% performance degradation since 2.6.18
>>
>> On Jul 3, 9:10 pm, Arjan van de Ven <ar...@xxxxxxxxxxxxx> wrote:
>> > On Fri, 3 Jul 2009 21:54:58 +0200
>> >
>> > Andi Kleen <a...@xxxxxxxxxxxxxx> wrote:
>> > > > That would seem to be a fruitful avenue of investigation --
>> > > > whether limiting the cards to a single RX/TX interrupt would be
>> > > > advantageous, or whether spreading the eight interrupts
>> out over
>> > > > the CPUs would be advantageous.
>> >
>> > > The kernel should really do the per cpu binding of MSIs
>> by default.
>> >
>> > ... so that you can't do power management on a per socket basis?
>> > hardly a good idea.
>> >
>> > just need to use a new enough irqbalance and it will spread out the
>> > interrupts unless your load is low enough to go into low power mode.
>>
>> I was finding newer kernels (>~2.6.24) would set the
>> Redirection Hint bit in the MSI address vector, allowing the
>> processors to deliver the interrupt to the lowest interrupt
>> priority (eg idle, no powersave) core
>> (http://www.intel.com/Assets/PDF/manual/253668.pdf pp10-66)
>> and older irqbalance daemons would periodically naively
>> rewrite the bitmask of cores, delivering the interrupt to a
>> static one.
>>
>> Thus, it may be worth checking if disabling any older
>> irqbalance daemon gives any win.
>>
>> Perhaps there is value in writing different subsets of cores
>> to the MSI address vector core bitmask (with the redirection
>> hint enabled) for different I/O queues on heavy interrupt
>> sources? By default, it's all cores.
>>
>
> Possible enhancement -
>
> 1) Drain the responses in the xmit_frame() path. That is, post the TX-request() and just before returning see if there are
>   any more responses in the RX-queue. This will minimize(only if the NIC f/w coalesces) interrupt load.
>   The n/w core should drain the responses rather than calling the drain-routine from the adapter's xmit_frame() handler. This way there won't be any need to
>   modify individual xmit_frame handlers.

The problem of additional checking on such a hot path, is each
(synchronous) read over the PCIe bus takes ~1us, which is the same
order of cost of executing 1000 instructions (and getting greater with
faster processors and deeper serial buses). Perhaps it's sufficiently
low cost if the NIC's RX queue status/structure was in main memory (vs
registers over PCI).

If latency is not favoured over throughput, increasing the packet
coalescing watermarks may reduce interrupt rate and thus some
performance loss?

Daniel
--
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/