Re: [PATCH] pci: Enable overrides for missing ACS capabilities

From: Alex Williamson
Date: Wed Jun 26 2013 - 15:03:28 EST


On Mon, 2013-06-24 at 11:43 -0600, Bjorn Helgaas wrote:
> On Wed, Jun 19, 2013 at 6:43 AM, Don Dutile <ddutile@xxxxxxxxxx> wrote:
> > On 06/18/2013 10:52 PM, Bjorn Helgaas wrote:
> >>
> >> On Tue, Jun 18, 2013 at 5:03 PM, Don Dutile<ddutile@xxxxxxxxxx> wrote:
> >>>
> >>> On 06/18/2013 06:22 PM, Alex Williamson wrote:
> >>>>
> >>>>
> >>>> On Tue, 2013-06-18 at 15:31 -0600, Bjorn Helgaas wrote:
> >>>>>
> >>>>>
> >>>>> On Tue, Jun 18, 2013 at 12:20 PM, Alex Williamson
> >>>>> <alex.williamson@xxxxxxxxxx> wrote:
> >>>>>>
> >>>>>>
> >>>>>> On Tue, 2013-06-18 at 11:28 -0600, Bjorn Helgaas wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> On Thu, May 30, 2013 at 12:40:19PM -0600, Alex Williamson wrote:
> >>>
> >>> ...
> >>>>>>>
> >>>>>>> Who do you expect to decide whether to use this option? I think it
> >>>>>>> requires intimate knowledge of how the device works.
> >>>>>>>
> >>>>>>> I think the benefit of using the option is that it makes assignment
> >>>>>>> of
> >>>>>>> devices to guests more flexible, which will make it attractive to
> >>>>>>> users.
> >>>>>>> But most users have no way of knowing whether it's actually *safe* to
> >>>>>>> use this. So I worry that you're adding an easy way to pretend
> >>>>>>> isolation
> >>>>>>> exists when there's no good way of being confident that it actually
> >>>>>>> does.
> >>
> >>
> >>> ...
> >>
> >>
> >>>>> I wonder if we should taint the kernel if this option is used (but not
> >>>>> for specific devices added to pci_dev_acs_enabled[]). It would also
> >>>>> be nice if pci_dev_specific_acs_enabled() gave some indication in
> >>>>> dmesg for the specific devices you're hoping to add to
> >>>>> pci_dev_acs_enabled[]. It's not an enumeration-time quirk right now,
> >>>>> so I'm not sure how we'd limit it to one message per device.
> >>>>
> >>>>
> >>>> Right, setup vs use and getting single prints is a lot of extra code.
> >>>> Tainting is troublesome for support, Don had some objections when I
> >>>> suggested the same to him.
> >>>>
> >>> For RH GSS (Global Support Services), a 'taint' in the kernel printk
> >>> means
> >>> RH doesn't support that system. The 'non-support' due to 'taint' being
> >>> printed
> >>> out in this case may be incorrect -- RH may support that use, at least
> >>> until
> >>> a more sufficient patched kernel is provided.
> >>> Thus my dissension that 'taint' be output. WARN is ok. 'driver beware',
> >>> 'unleashed dog afoot'.... sure...
> >>
> >>
> >> So ... that's really a RH-specific support issue, and easily worked
> >> around by RH adding a patch that turns off tainting.
> >>
> > sure. what's another patch to the thousands... :-/
> >
> >> It still sounds like a good idea to me for upstream, where use of this
> >> option can very possibly lead to corruption or information leakage
> >> between devices the user claimed were isolated, but in fact were not.
> >
> > Did I miss something? This patch provides a user-level/chosen override;
> > like all other overrides, (pci=realloc, etc.), it can lead to a failing
> > system.
> > IMO, this patch is no different. If you want to tag this patch with taint,
> > then let's audit all the (PCI) overrides and taint them appropriately.
> > Taint should be reserved to changes to the kernel that were done outside
> > the development of the kernel, or with the explicit intent to circumvent
> > the normal operation of the kernel. This patch provides a way to enable
> > ACS checking to succeed when the devices have not provided sufficiently
> > complete
> > ACS information. i.e., it's a growth path for PCIe-ACS and its need for
> > proper support.
>
> We're telling the kernel to assume something (the hardware provides
> protection) that may not be true. If that assumption turns out to be
> false, the result is that a VM can be crashed or comprised by another
> VM.
>
> One difference I see is that this override can lead to a crash that
> looks like random memory corruption and has no apparent connection to
> the actual cause. Most other overrides won't cause run-time crashes
> (I think they're more likely to cause boot or device configuration
> failures), and the dmesg log will probably have good clues as to the
> reason.
>
> But the possibility of compromise is probably even more serious,
> because there would be no crash at all, and we'd have no indication
> that VM A read or corrupted data in VM B. I'm very concerned about
> that, enough so that it's not clear to me that an override belongs in
> the upstream kernel at all.
>
> Yes, that would mean some hardware is not suitable for device
> assignment. That just sounds like "if hardware manufacturers do their
> homework and support ACS properly, their hardware is more useful for
> virtualization than other hardware." I don't see the problem with
> that.

That's easy to say for someone that doesn't get caught trying to explain
this to users over and over. In many cases devices don't do
peer-to-peer and missing ACS is an oversight. I imagine that quite a
few vendors also see the ACS capability as a means to allow control of
ACS and therefore see it as a much larger investment that just providing
an empty ACS structure in config space to indicate the lack of
peer-to-peer.

Even if we taint the kernel when this is enabled and add extremely
verbose warnings in kernel-parameters.txt, I think there's value to
providing an on-the-spot workaround to users. In many cases we're not
going to get a response from vendors. Removing the
downstream/multifunction catch-alls might be another way to raise the
bar for this kind of override. Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/