Re: [PATCH v2] irq: add quirk for broken interrupt remapping on 55XXchipsets

From: Neil Horman
Date: Thu Apr 04 2013 - 13:51:51 EST


On Thu, Apr 04, 2013 at 11:14:30AM -0600, Bjorn Helgaas wrote:
> On Thu, Apr 4, 2013 at 9:39 AM, Neil Horman <nhorman@xxxxxxxxxxxxx> wrote:
> > On Thu, Apr 04, 2013 at 08:57:06AM -0600, Bjorn Helgaas wrote:
> >> On Thu, Apr 4, 2013 at 8:50 AM, Neil Horman <nhorman@xxxxxxxxxxxxx> wrote:
> >> > On Thu, Apr 04, 2013 at 03:27:29PM +0100, David Woodhouse wrote:
> >> >> On Wed, 2013-04-03 at 17:53 -0600, Bjorn Helgaas wrote:
> >> >> > );
> >> >> > > +
> >> >> > > + if ((revision == 0x13) && irq_remapping_enabled) {
> >> >> > > + pr_warn(HW_ERR "This system BIOS has enabled interrupt remapping\n"
> >> >> > > + "on a chipset that contains an errata making that\n"
> >> >> > > + "feature unstable. Please reboot with nointremap\n"
> >> >> > > + "added to the kernel command line and contact\n"
> >> >> > > + "your BIOS vendor for an update");
> >> >>
> >> >> This should be WARN_TAINT(TAINT_FIRMWARE_WORKAROUND). And 'an erratum'.
> >> >>
> >> > Ok, copy that. I'll repost shortly
> >>
> >> When you do, please include URLs for any problem reports or bugzillas you have.
> >>
> > Well, those are going to be vendor specific, so I'm not sure we can really do
> > that, at least not in any meaningful way.
>
> Sorry, I don't understand your point. It's useful to know who
> reported it (e.g., for future testers) and what happened and what
> bugzillas it solved. Of course it applies only to machines with this
> chipset and certain BIOS revisions.
>
Oh, you want the bug report that I'm fixing this against? Sure, I can do that.
I thought you wanted me to include a url in the WARN_TAINT, with which user
could report occurances of this bug. Yeah, the bug that this is reported in is:
https://bugzilla.redhat.com/show_bug.cgi?id=887006

Its standing in for about a dozen or so variants of this issue we've seen

> >> I assume Windows "just works" in this situation?
> > No more or less than linux does in this case. The Intel provided errata
> > indicates that the only acceptable workaround is to disable remapping in the
> > BIOS, so I would presume that if a windows system has a BIOS that doesn't
> > implement this fix, its just as exposed as we are.
>
> It sounds like the effect of this bug is that on Linux, devices may
> not work at all because of lost interrupts. Either Windows must never
> enable remapping (so it never sees the bug), or it must be designed in
> a way that tolerates the problem. I can't believe these machines
> shipped with Windows certification if devices didn't work correctly.
>
I don't know what to tell you here. We explicitly asked intel about this, and
there was an effort to recode the interrupt migration code to properly manage
this situation, and intels response was "No, just disable it in BIOS", so we're
telling people to disable it in BIOS. You'd have to ask somebody at Microsoft
what Windows did or did not do about this problem.

> Either way, I don't understand why we can't make the quirk just fix
> this. Booting with "nointremap" only sets disable_irq_remap, which is
> only used by irq_remapping_supported(). Early quirks are run before
> irq_remapping_supported () is ever called, so an early quirk ought to
> be just as effective as the command line option. Here's the relevant
> call tree I see:
>
> start_kernel
> setup_arch
> parse_early_param
> early_quirks
> rest_init
> ...
> <first use of disable_irq_remap>
>
> The x86 setup_arch() does call generic_apic_probe(), but as far as I
> can tell, none of the APIC .probe() methods reference
> disable_irq_remap, so that doesn't look like a problem.
>
Well, I can give it a try, but I'm sure something went wrong last time I did.
Regardless, theres also the security issue to consider here - namely that
disabling irq remapping opens up users of virt to a possible security bug
(potential irq injection). Some users may wish to live with the remapping
error, given that error typically leads to devices that need to be
restarted/reset to start working again, rather than live with the security hole.
I rather like the warning, that gives users a choice, but I'll spin up a version
that just disables it if you would rather.

Neil

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/