Re: [PATCH] pci: Enable overrides for missing ACS capabilities

From: Don Dutile
Date: Tue Jul 23 2013 - 14:38:59 EST

Next message: Bjorn Helgaas: "Re: [PATCH V2 6/6] pci: mvebu: Convert to use devm_ioremap_resource"
Previous message: Myklebust, Trond: "Re: Linux 3.11-rc2"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 06/26/2013 03:03 PM, Alex Williamson wrote:

On Mon, 2013-06-24 at 11:43 -0600, Bjorn Helgaas wrote:
On Wed, Jun 19, 2013 at 6:43 AM, Don Dutile<ddutile@xxxxxxxxxx> wrote:
On 06/18/2013 10:52 PM, Bjorn Helgaas wrote:

On Tue, Jun 18, 2013 at 5:03 PM, Don Dutile<ddutile@xxxxxxxxxx> wrote:

On 06/18/2013 06:22 PM, Alex Williamson wrote:

On Tue, 2013-06-18 at 15:31 -0600, Bjorn Helgaas wrote:

On Tue, Jun 18, 2013 at 12:20 PM, Alex Williamson
<alex.williamson@xxxxxxxxxx> wrote:

On Tue, 2013-06-18 at 11:28 -0600, Bjorn Helgaas wrote:

On Thu, May 30, 2013 at 12:40:19PM -0600, Alex Williamson wrote:

...

Who do you expect to decide whether to use this option? I think it
requires intimate knowledge of how the device works.

I think the benefit of using the option is that it makes assignment
of
devices to guests more flexible, which will make it attractive to
users.
But most users have no way of knowing whether it's actually *safe* to
use this. So I worry that you're adding an easy way to pretend
isolation
exists when there's no good way of being confident that it actually
does.

...

I wonder if we should taint the kernel if this option is used (but not
for specific devices added to pci_dev_acs_enabled[]). It would also
be nice if pci_dev_specific_acs_enabled() gave some indication in
dmesg for the specific devices you're hoping to add to
pci_dev_acs_enabled[]. It's not an enumeration-time quirk right now,
so I'm not sure how we'd limit it to one message per device.

Right, setup vs use and getting single prints is a lot of extra code.
Tainting is troublesome for support, Don had some objections when I
suggested the same to him.

For RH GSS (Global Support Services), a 'taint' in the kernel printk
means
RH doesn't support that system. The 'non-support' due to 'taint' being
printed
out in this case may be incorrect -- RH may support that use, at least
until
a more sufficient patched kernel is provided.
Thus my dissension that 'taint' be output. WARN is ok. 'driver beware',
'unleashed dog afoot'.... sure...

So ... that's really a RH-specific support issue, and easily worked
around by RH adding a patch that turns off tainting.

sure. what's another patch to the thousands... :-/

It still sounds like a good idea to me for upstream, where use of this
option can very possibly lead to corruption or information leakage
between devices the user claimed were isolated, but in fact were not.

Did I miss something? This patch provides a user-level/chosen override;
like all other overrides, (pci=realloc, etc.), it can lead to a failing
system.
IMO, this patch is no different. If you want to tag this patch with taint,
then let's audit all the (PCI) overrides and taint them appropriately.
Taint should be reserved to changes to the kernel that were done outside
the development of the kernel, or with the explicit intent to circumvent
the normal operation of the kernel. This patch provides a way to enable
ACS checking to succeed when the devices have not provided sufficiently
complete
ACS information. i.e., it's a growth path for PCIe-ACS and its need for
proper support.

We're telling the kernel to assume something (the hardware provides
protection) that may not be true. If that assumption turns out to be
false, the result is that a VM can be crashed or comprised by another
VM.

One difference I see is that this override can lead to a crash that
looks like random memory corruption and has no apparent connection to
the actual cause. Most other overrides won't cause run-time crashes
(I think they're more likely to cause boot or device configuration
failures), and the dmesg log will probably have good clues as to the
reason.

But the possibility of compromise is probably even more serious,
because there would be no crash at all, and we'd have no indication
that VM A read or corrupted data in VM B. I'm very concerned about
that, enough so that it's not clear to me that an override belongs in
the upstream kernel at all.

Yes, that would mean some hardware is not suitable for device
assignment. That just sounds like "if hardware manufacturers do their
homework and support ACS properly, their hardware is more useful for
virtualization than other hardware." I don't see the problem with
that.

That's easy to say for someone that doesn't get caught trying to explain
this to users over and over. In many cases devices don't do
peer-to-peer and missing ACS is an oversight. I imagine that quite a
few vendors also see the ACS capability as a means to allow control of
ACS and therefore see it as a much larger investment that just providing
an empty ACS structure in config space to indicate the lack of
peer-to-peer.

+1 -- A good explanation why the workaround is needed.
In fact, until recently, we didn't understand that an empty ACS meant
'no p2p' and not just an empty ACS with no control to disable p2p. :-/

Even if we taint the kernel when this is enabled and add extremely
verbose warnings in kernel-parameters.txt, I think there's value to
providing an on-the-spot workaround to users. In many cases we're not
going to get a response from vendors. Removing the
downstream/multifunction catch-alls might be another way to raise the
bar for this kind of override. Thanks,

I'll re-state what I said previously (maybe privately): there are numerous
ways to crash a system with kernel params. Agreed, that in most instances,
the params cause the system not to boot at all if they don't work/help, but
arguing to taint the kernel in this case is like asking rm to detect 'rm -rf /'
and flag the user you're trashing your system .... and not temporarily, or in
a recoverable way.

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Bjorn Helgaas: "Re: [PATCH V2 6/6] pci: mvebu: Convert to use devm_ioremap_resource"
Previous message: Myklebust, Trond: "Re: Linux 3.11-rc2"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]