Re: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

From: Alexey Kardashevskiy
Date: Fri May 06 2016 - 02:35:55 EST


On 05/06/2016 01:05 AM, Alex Williamson wrote:
On Thu, 5 May 2016 12:15:46 +0000
"Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:

From: Yongji Xie [mailto:xyjxie@xxxxxxxxxxxxxxxxxx]
Sent: Thursday, May 05, 2016 7:43 PM

Hi David and Kevin,

On 2016/5/5 17:54, David Laight wrote:

From: Tian, Kevin
Sent: 05 May 2016 10:37
...
Acutually, we are not aimed at accessing MSI-X table from
guest. So I think it's safe to passthrough MSI-X table if we
can make sure guest kernel would not touch MSI-X table in
normal code path such as para-virtualized guest kernel on PPC64.

Then how do you prevent malicious guest kernel accessing it?
Or a malicious guest driver for an ethernet card setting up
the receive buffer ring to contain a single word entry that
contains the address associated with an MSI-X interrupt and
then using a loopback mode to cause a specific packet be
received that writes the required word through that address.

Remember the PCIe cycle for an interrupt is a normal memory write
cycle.

David


If we have enough permission to load a malicious driver or
kernel, we can easily break the guest without exposed
MSI-X table.

I think it should be safe to expose MSI-X table if we can
make sure that malicious guest driver/kernel can't use
the MSI-X table to break other guest or host. The
capability of IRQ remapping could provide this
kind of protection.


With IRQ remapping it doesn't mean you can pass through MSI-X
structure to guest. I know actual IRQ remapping might be platform
specific, but at least for Intel VT-d specification, MSI-X entry must
be configured with a remappable format by host kernel which
contains an index into IRQ remapping table. The index will find a
IRQ remapping entry which controls interrupt routing for a specific
device. If you allow a malicious program random index into MSI-X
entry of assigned device, the hole is obvious...

Above might make sense only for a IRQ remapping implementation
which doesn't rely on extended MSI-X format (e.g. simply based on
BDF). If that's the case for PPC, then you should build MSI-X
passthrough based on this fact instead of general IRQ remapping
enabled or not.

I don't think anyone is expecting that we can expose the MSI-X vector
table to the guest and the guest can make direct use of it. The end
goal here is that the guest on a power system is already
paravirtualized to not program the device MSI-X by directly writing to
the MSI-X vector table. They have hypercalls for this since they
always run virtualized. Therefore a) they never intend to touch the
MSI-X vector table and b) they have sufficient isolation that a guest
can only hurt itself by doing so.

On x86 we don't have a), our method of programming the MSI-X vector
table is to directly write to it. Therefore we will always require QEMU
to place a MemoryRegion over the vector table to intercept those
accesses. However with interrupt remapping, we do have b) on x86, which
means that we don't need to be so strict in disallowing user accesses
to the MSI-X vector table. It's not useful for configuring MSI-X on
the device, but the user should only be able to hurt themselves by
writing it directly. x86 doesn't really get anything out of this
change, but it helps this special case on power pretty significantly
aiui. Thanks,

Excellent short overview, saved :)

How do we proceed with these patches? Nobody seems objecting them but also nobody seems taking them either...




--
Alexey