Re: [RFC 0/4] dynamically allocate arch specific system vectors

From: Eric W. Biederman
Date: Thu Sep 18 2008 - 20:35:40 EST

Next message: Andi Kleen: "Re: [PATCH 0/3] fix GART to respect device's dma_mask about virtual mappings"
Previous message: Ravikiran G Thirumalai: "Re: [PATCH] printk: Print cpuid along with the timestamp withCONFIG_PRINTK_TIME"
In reply to: Jack Steiner: "Re: [RFC 0/4] dynamically allocate arch specific system vectors"
Next in thread: Ingo Molnar: "Re: [RFC 0/4] dynamically allocate arch specific system vectors"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Jack Steiner <steiner@xxxxxxx> writes:

> On Wed, Sep 17, 2008 at 03:15:07PM -0700, Eric W. Biederman wrote:
>> Jack Steiner <steiner@xxxxxxx> writes:
>>
>> > On Wed, Sep 17, 2008 at 12:15:42PM -0700, H. Peter Anvin wrote:
>> >> Dean Nelson wrote:
>> >> >
>> >> > sgi-gru driver
>> >> >
>> >> >The GRU is not an actual external device that is connected to an IOAPIC.
>> >> >The gru is a hardware mechanism that is embedded in the node controller
>> >> >(UV hub) that directly connects to the cpu socket. Any cpu (with
>> >> >permission)
>> >> >can do direct loads and stores to the gru. Some of these stores will
> result
>> >> >in an interrupt being sent back to the cpu that did the store.
>> >> >
>> >> >The interrupt vector used for this interrupt is not in an IOAPIC. Instead
>> >> >it must be loaded into the GRU at boot or driver initialization time.
>> >> >
>> >>
>> >> Could you clarify there: is this one vector number per CPU, or are you
>> >> issuing a specific vector number and just varying the CPU number?
>> >
>> > It is one vector for each cpu.
>> >
>> > It is more efficient for software if the vector # is the same for all cpus
>> Why? Especially in terms of irq counting that would seem to lead to cache
>> line conflicts.
>
> Functionally, it does not matter. However, if the IRQ is not a per-cpu IRQ, a
> very large number of IRQs (and vectors) may be needed. The GRU requires 32
> interrupt
> lines on each blade. A large system can currently support up to 512 blades.

Every vendor of high end hardware is saying they intend to provide
1 or 2 queues per cpu and 1 irq per queue. So the GRU is not special in
that regard. Also a very large number of IRQs is not a problem as
soon as we start dynamically allocating them, which is currently
in progress.

Once we start dynamically allocating irq_desc structures we can put
them in node-local memory and guarantee there is no data shared between
cpus.

> After looking thru the MSI code, we are starting to believe that we should
> separate
> the GRU requirements from the XPC requirements. It looks like XPC can easily use
> the MSI infrastructure. XPC needs a small number of IRQs, and interrupts are
> typically
> targeted to a single cpu. They can also be retargeted using the standard
> methods.

Alright.

I would be completely happy if there were interrupts who's affinity we can
not change, and are always targeted at a single cpu.

> The GRU, OTOH, is more like a timer interrupt or like a co-processor interrupt.
> GRU interrupts can occur on any cpu using the GRU. When interrupts do occur, all
> that
> needs to happen is to call an interrupt handler. I'm thinking of something like
> the following:
>
> - permanently reserve 2 system vectors in include/asm-x86/irq_vectors.h
> - in uv_system_init(), call alloc_intr_gate() to route the
> interrupts to a function in the file containing uv_system_init().
> - initialize the GRU chipset with the vector, etc, ...
> - if an interrupt occurs and the GRU driver is NOT loaded, print
> an error message (rate limited or one time)
>
> - provide a special UV hook for the GRU driver to register/deregister a
> special callback function for GRU interrupts

That would work. So far the GRU doesn't sound that special.

For a lot of this I would much rather solve the general case on this
giving us a solution that works for all high end interrupts rather than
one specific solution just for the GRU. Especially since it looks like
we have most of the infrastructure already present to solve the general
case and we have to develop and review the specific case from scratch.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Andi Kleen: "Re: [PATCH 0/3] fix GART to respect device's dma_mask about virtual mappings"
Previous message: Ravikiran G Thirumalai: "Re: [PATCH] printk: Print cpuid along with the timestamp withCONFIG_PRINTK_TIME"
In reply to: Jack Steiner: "Re: [RFC 0/4] dynamically allocate arch specific system vectors"
Next in thread: Ingo Molnar: "Re: [RFC 0/4] dynamically allocate arch specific system vectors"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]