Re: [PATCH 5/5] x86: update nr_irqs according cpu num

From: Eric W. Biederman
Date: Fri Jan 08 2010 - 15:10:42 EST

"H. Peter Anvin" <hpa@xxxxxxxxx> writes:

> On 01/08/2010 11:11 AM, Eric W. Biederman wrote:
>> Yinghai Lu <yinghai@xxxxxxxxxx> writes:
>>> that is max number on run time.
>> Ouch! Unless I misread this code this will leave nr_irqs at
>> NR_IRQS_LEGACY. aka 16.

I goofed and misread this. I was looking at nr_irqs_gsi which
is initialized to 16.

We actually initialize nr_irqs to NR_IRQS, which has an unfortunately
convoluted formula, that winds up being 8*NR_CPUS or 32 *MAX_IO_APICS.
in the extreme cases.

Since there are still arrays sized at NR_IRQS (bleh) we can not
increase nr_irqs to be greater than NR_IRQS.

So YN can you do the simple thing here and simply remove
arch_probe_nr_irqs(). Sane code doesn't care how big nr_irqs is and
code that does care needs to be fixed.

>> Let's do something stupid and simple.
>> nr_irqs = nr_cpus_ids * 256; /* Semi-arbitrary number */
> This would be 1048576 on the biggest machines we currently support.
> Now, the number of IRQ *vectors* is limited to
> (224-system vectors)*(cpu count), so one could argue that if there is
> anything that is not semi-arbitrary it would be that number, but that
> doesn't account for vector sharing.

Except we have irq sources that we know about that are never utilized,
Think of unconnected inputs to ioapics.

I don't know if we ever actually perform vector sharing. The only case
I recall where the code could share vectors is if the firmware tables
told us to irq sources were the same interrupt. I don't think that
happens. We do have the remains of support for vector sharing
in the code but I don't think it is utilized. MSI interrupts certainly
can not share vectors.

My point with the semi-arbitrary number is that we should not think of
nr_irqs as something defined by the resources of the receivers of
interrupts. NR_IRQS has never been that. nr_irqs really is a limit
on how many interrupt sources we have.

> Do we have any place which requires nr_irqs to be *stable*, or can we
> simply treat it as a high water mark for IRQ numbers used?

We have several loops that walk through the irq descriptors and look for
an unbound irq. Which means having nr_irqs as a high water mark is not
going to work today.

>> Ideally we would set "nr_irqs = 0x7fffffff;" but we have just enough
>> places using nr_irqs that I think those loops would get painful if we
>> were to do that.
> Ideally we should presumably get rid of nr_irqs completely?

Yes. It was enough of a pain the first pass at it that we wound
up with nr_irqs, a value that can vary at boot time.

Once YH's radix tree changes get it in. A war on NR_IRQS and nr_irqs
seems appropriate.

