Re: [PATCHv2 3/4] pci: Determine actual VPD size on first access

From: Rustad, Mark D
Date: Mon Aug 15 2016 - 19:17:04 EST


Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> wrote:

Filtering things to work around bugs in existing guests to avoid crashes
is a different kettle of fish and could be justified but keep in mind that
in most cases a malicious guest will be able to exploit those HW flaws.

Bugs in existing guests is an interesting case, but I have been focused on getting acceptable behavior from a properly functioning guest, in the face of hardware issues that can only be resolved in a single place.

I agree that a malicious guest can cause all kinds of havoc with directly-assigned devices. Consider a 4-port PHY chip on a shared MDIO bus, for instance. There is really nothing to be done about the potential for mischief with that kind of thing.

The VPD problem that I had been concerned about arises from a bad design in the PCI spec together with implementations that share the registers across functions. The hardware isn't going to change and I really doubt that the spec will either, so we address it the only place we can.

I am certain that we agree that not everything can or should be addressed in vfio. I did not mean to suggest it should try to address everything, but I think it should make it possible for correctly behaving guests to work. I think that is not unreasonable.

Perhaps the VPD range check should really just have been implemented for the sysfs interface, and left the vfio case unchecked. I don't know because I was not involved in that issue. Perhaps someone more intimately involved can comment on that notion.

Assuming that a device coming back from a guest is functional and not
completely broken and can be re-used without a full PERST or power cycle
is a wrong assumption. It may or may not work, no amount of "filtering"
will fix the fundamental issue. If your HW won't give you access to PERST
well ... blame Intel for not specifying a standard way to generate it in
the first place :-)

Yeah, I worry about the state that a malicious guest could leave a device in, but I consider direct assignment always risky anyway. I would just like it to at least work in the non-malicious guest cases.

I guess my previous response was really just too terse, I was just focused on unavoidable hangs and data corruption, which even were happening without any guest involvement. For me, guests were just an additional exposure of the same underlying issue.

With hindsight, it is easy to see that a standard reset would now be a pretty useful thing. I am sure that even if it existed, we would now have lots and lots of quirks around it as well! :-)

--
Mark Rustad, Networking Division, Intel Corporation

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail