Re: do_IRQ: 0.165 No irq handler for vector (irq -1)

From: Suresh Siddha
Date: Sat Feb 13 2010 - 13:19:38 EST


On Sat, 2010-02-13 at 02:25 -0700, Torsten Kaiser wrote:
> Ping?
>
> I reported this problem one day after -rc1 was out and it's still
> there in -rc8, the probably last -rc for 2.6.33.
> (I also reported it against -rc2, -rc3, -rc4 and -rc6)
>
> Apart from the patches related to the SiI register HOST_CTRL_MSIACK
> (that did not fix the problem) I have the feeling, that I'm not one
> step further to any fix.
>
> Is this a bug in the MSI-enable code in sata_sil24?
> Is this a bug in the MSI code in libata?
> Is this a bug in the IRQ system?
> Is this a bug in the x86 apic code?

There are primarily two issues you reported.

One is the spurious interrupt issue (for which you see "no irq handler
for vector messages). From your experimental results you verified that
this problem doesn't happen in physical apic mode. This shows that there
is some problem with the way this HW subsystem (involving sata_sil24)
handles logical mode. Most likely some bug either in the sata_sil24 or
in the platform paths (bridges etc) handling the sata_sil24 interrupts
(as you say, other devices work fine with MSI on this platform).

And the second problem is the sata timeouts (which happen irrespective
of the above spurious interrupts). It looks like interrupts are dropped
(which might be the reason why your ERR count -- apic error count --
increases).

Based on your experimental results, we can say that it is not the bug
with x86 apic code and irq subsystem.

> Is this a hardware bug in the SiI 3132?
> Is this a hardware bug in the MCP55?
> Is this a fatal bug or does it just need the right quirk?
>
> What should I do now?
> Keep posting that it's still broken at each -rc?
> Open a bug at bugzilla.kernel.org? Against what subsytem?
> Should I just not use the sata_sil.msi=1 commandline?

You should n't use that command line as your experiments showed that
sata_sil msi mode is clearly broken on this platform and perhaps report
the issue to the HW vendor (you should include in that report, the
spurious vector 165 that you see in logical mode and also the apic error
you see -- you can enable debug to see the error message that gets
printed in smp_error_interrupt() for this --)

thanks,
suresh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/