Re: [PATCH] xen/virtio: Handle PCI devices which Host controller is described in DT

From: Oleksandr Tyshchenko
Date: Fri Oct 21 2022 - 00:44:35 EST



On 19.10.22 23:38, Stefano Stabellini wrote:

Hello Stefano

> + Vikram
>
> On Wed, 19 Oct 2022, Oleksandr Tyshchenko wrote:
>>>> Regarding the virtio-mmio (platform) devices, yes, we could expose them
>>>> with status "disabled", and they won't get probed by default.
>>>> To be honest, I have experimented with that, when I was thinking of
>>>> possible hotplug for virtio-mmio devices (I know, this sounds uncommon
>>>> and strange).
>>>> I used Linux feature (CONFIG_OF_DYNAMIC, overlays) to update the
>>>> device-tree on running guest, so the toolstack initially inserts
>>>> virtio-mmio device nodes for non-boot devices
>>>> with status "disabled", and at the runtime, once we receive an event for
>>>> example, we change the status to "ok" and the corresponding virtio-mmio
>>>> device gets probed.
>>>> But again, it is not a 100% hotplug, as we need to pre-allocate memory
>>>> range and interrupt in advance (when generating guest device tree).
>>> Actually this is really cool! Does it work? It doesn't matter to me if
>>> the virtio devices are pci or mmio as long as we can solve the "wait"
>>> problem. So this could be a good solution.
>>
>> ... yes, it does. Initially I experimented with virtio-mmio devices, but
>> today I tried with PCI host bridge as well.
>> I won't describe the commands which I used to apply/remove device-tree
>> overlays from the userspace as well as the context of
>> dtso files I created, I will describe how that could be done from the
>> kernel by using existing functionality (CONFIG_OF_DYNAMIC).
>>
>> As I said if we exposed the devices with status "disabled", they
>> wouldn't get probed by default. Once we receive an signal
>> that otherend is ready, we change the status to "ok" and the
>> corresponding device gets probed.
>>
>> So below the test patch, which just change the status of the required
>> device-tree node (as you can see the code to update the property is
>> simple enough),
>> I hacked "xl sysrq" for the convenience of testing.
>>
>>
>> diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
>> index 045c1805b2d5..9683ce075bc9 100644
>> --- a/drivers/xen/grant-dma-ops.c
>> +++ b/drivers/xen/grant-dma-ops.c
>> @@ -10,6 +10,7 @@
>>  #include <linux/module.h>
>>  #include <linux/dma-map-ops.h>
>>  #include <linux/of.h>
>> +#include <linux/of_platform.h>
>>  #include <linux/pci.h>
>>  #include <linux/pfn.h>
>>  #include <linux/xarray.h>
>> @@ -379,6 +380,108 @@ bool xen_is_grant_dma_device(struct device *dev)
>>         return false;
>>  }
>>
>> +/* TODO: Consider using statically allocated (struct property status) */
>> +static int xen_grant_dma_enable_device(struct device_node *np)
>> +{
>> +       struct property *status;
>> +
>> +       status = kzalloc(sizeof(*status), GFP_KERNEL);
>> +       if (!status)
>> +               return -ENOMEM;
>> +
>> +       status->name = kstrdup("status", GFP_KERNEL);
>> +       if (!status->name)
>> +               return -ENOMEM;
>> +
>> +       status->value = kstrdup("okay", GFP_KERNEL);
>> +       if (!status->value)
>> +               return -ENOMEM;
>> +
>> +       status->length = sizeof("okay");
>> +
>> +       return of_update_property(np, status);
>> +}
>> +
>> +static int xen_grant_dma_disable_device(struct device_node *np)
>> +{
>> +       struct property *status;
>> +
>> +       status = kzalloc(sizeof(*status), GFP_KERNEL);
>> +       if (!status)
>> +               return -ENOMEM;
>> +
>> +       status->name = kstrdup("status", GFP_KERNEL);
>> +       if (!status->name)
>> +               return -ENOMEM;
>> +
>> +       status->value = kstrdup("disabled", GFP_KERNEL);
>> +       if (!status->value)
>> +               return -ENOMEM;
>> +
>> +       status->length = sizeof("disabled");
>> +
>> +       return of_update_property(np, status);
>> +}
>> +
>> +void xen_grant_dma_handle_sysrq(int key)
>> +{
>> +       struct device_node *np;
>> +       const char *path;
>> +       bool enable;
>> +
>> +       printk("%s: got key %d\n", __func__, key);
>> +
>> +       switch (key) {
>> +       case '0':
>> +               path = "/virtio@2000000";
>> +               enable = true;
>> +               break;
>> +
>> +       case '1':
>> +               path = "/virtio@2000200";
>> +               enable = true;
>> +               break;
>> +
>> +       case '2':
>> +               path = "/virtio@2000000";
>> +               enable = false;
>> +               break;
>> +
>> +       case '3':
>> +               path = "/virtio@2000200";
>> +               enable = false;
>> +               break;
>> +
>> +       case '4':
>> +               path = "/pcie@10000000";
>> +               enable = true;
>> +               break;
>> +
>> +       case '5':
>> +               path = "/pcie@10000000";
>> +               enable = false;
>> +               break;
>> +
>> +       default:
>> +               printk("%s: wrong key %d\n", __func__, key);
>> +               return;
>> +       }
>> +
>> +       np = of_find_node_by_path(path);
>> +       if (!np) {
>> +               printk("%s: failed to find node by path %s\n", __func__,
>> path);
>> +               return;
>> +       }
>> +
>> +       if (enable) {
>> +               xen_grant_dma_enable_device(np);
>> +               printk("%s: enable %s\n", __func__, path);
>> +       } else {
>> +               xen_grant_dma_disable_device(np);
>> +               printk("%s: disable %s\n", __func__, path);
>> +       }
>> +}
>> +
>>  bool xen_virtio_mem_acc(struct virtio_device *dev)
>>  {
>>         if (IS_ENABLED(CONFIG_XEN_VIRTIO_FORCE_GRANT) || xen_pv_domain())
>> diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
>> index c16df629907e..6df96be1ea40 100644
>> --- a/drivers/xen/manage.c
>> +++ b/drivers/xen/manage.c
>> @@ -308,7 +308,8 @@ static void sysrq_handler(struct xenbus_watch
>> *watch, const char *path,
>>                 goto again;
>>
>>         if (sysrq_key != '\0')
>> -               handle_sysrq(sysrq_key);
>> +               /*handle_sysrq(sysrq_key);*/
>> +               xen_grant_dma_handle_sysrq(sysrq_key);
>>  }
>>
>>  static struct xenbus_watch sysrq_watch = {
>> diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
>> index a34f4271a2e9..c2da1bc24091 100644
>> --- a/include/xen/xen-ops.h
>> +++ b/include/xen/xen-ops.h
>> @@ -215,6 +215,8 @@ static inline void xen_preemptible_hcall_end(void) { }
>>
>>  #endif /* CONFIG_XEN_PV && !CONFIG_PREEMPTION */
>>
>> +void xen_grant_dma_handle_sysrq(int key);
>> +
>>  #ifdef CONFIG_XEN_GRANT_DMA_OPS
>>  void xen_grant_setup_dma_ops(struct device *dev);
>>  bool xen_is_grant_dma_device(struct device *dev);
>> (END)
>>
>> So how it looks like:
>>
>> 1. DomU boots without PCI Host bridge probed. So nothing PCI related is
>> observed in DomU.
>>
>> cat /proc/device-tree/pcie@10000000/status
>> disabled
>>
>> 2. I run backends in DomD and after that issue a signal to "enable"
>>
>> root@generic-armv8-xt-dom0:~# xl sysrq DomU 4
>>
>> 3. The PCI Host bridge is probed, and all required PCI devices are
>> discovered
>>
>> root@generic-armv8-xt-dom0:~# xl console DomU
>> [  237.407620] xen_grant_dma_handle_sysrq: got key 52
>> [  237.408133] pci-host-generic 10000000.pcie: host bridge
>> /pcie@10000000 ranges:
>> [  237.408186] pci-host-generic 10000000.pcie:      MEM
>> 0x0023000000..0x0032ffffff -> 0x0023000000
>> [  237.408231] pci-host-generic 10000000.pcie:      MEM
>> 0x0100000000..0x01ffffffff -> 0x0100000000
>> [  237.408313] pci-host-generic 10000000.pcie: ECAM at [mem
>> 0x10000000-0x1fffffff] for [bus 00-ff]
>> [  237.408451] pci-host-generic 10000000.pcie: PCI host bridge to bus
>> 0000:00
>> [  237.408490] pci_bus 0000:00: root bus resource [bus 00-ff]
>> [  237.408517] pci_bus 0000:00: root bus resource [mem
>> 0x23000000-0x32ffffff]
>> [  237.408545] pci_bus 0000:00: root bus resource [mem
>> 0x100000000-0x1ffffffff pref]
>> [  237.409043] pci 0000:00:00.0: [1b36:0008] type 00 class 0x060000
>> [  237.458045] pci 0000:00:01.0: [1af4:1041] type 00 class 0x020000
>> [  237.502588] pci 0000:00:01.0: reg 0x20: [mem 0x00000000-0x00003fff
>> 64bit pref]
>> [  237.507475] pci 0000:00:02.0: [1af4:1042] type 00 class 0x010000
>> [  237.552706] pci 0000:00:02.0: reg 0x20: [mem 0x00000000-0x00003fff
>> 64bit pref]
>> [  237.559847] pci 0000:00:01.0: BAR 4: assigned [mem
>> 0x100000000-0x100003fff 64bit pref]
>> [  237.560411] pci 0000:00:02.0: BAR 4: assigned [mem
>> 0x100004000-0x100007fff 64bit pref]
>> [  237.563324] virtio-pci 0000:00:01.0: Set up Xen grant DMA ops (rid
>> 0x8 sid 0x1)
>> [  237.564833] virtio-pci 0000:00:01.0: enabling device (0000 -> 0002)
>> [  237.582734] virtio-pci 0000:00:02.0: Set up Xen grant DMA ops (rid
>> 0x10 sid 0x1)
>> [  237.583413] virtio-pci 0000:00:02.0: enabling device (0000 -> 0002)
>> [  237.595712] virtio_blk virtio1: 4/0/0 default/read/poll queues
>> [  237.596227] virtio_net virtio0 enp0s1: renamed from eth1
>> [  237.602499] virtio_blk virtio1: [vda] 4096000 512-byte logical blocks
>> (2.10 GB/1.95 GiB)
>> [  237.606317] xen_grant_dma_handle_sysrq: enable /pcie@10000000
>>
>> 4. The same way the pseudo-hotremove would work (if we change the status
>> to "disabled" the corresponding device gets removed)
>>
>>
>> If this pseudo-hotplug sounds appropriate for the dom0less,
> This is great! Yes I think it is totally acceptable.


Perfect!


>
>
>> the one of the next questions would be what mechanism to use for
>> signalling (event, xenstore, whatever).
> For your information, we had to solve a similar issue a few months ago
> to let a domU discover a newly added and directly assigned programmable
> logic block. That was also done by applying DT overlays, first to Xen,
> then to the domU.
>
> Have a look at Vikram's Xen Summit presentation:
> https://urldefense.com/v3/__https://static.sched.com/hosted_files/xen2022/e8/Introduce*20Dynamic*20Device*20Node*20Programming*20for*20Xen.pdf__;JSUlJSUl!!GF_29dbcQIUBPA!yTJGjmmfrdJfoc-magnbk6EcuNspXxuQyDMzItgaV9mehJMj37w7Go_O9tDCQV3Qpij0PORZdiZsjBYROsaRNWVjPNUucg$ [static[.]sched[.]com]
>
> We wrote a small xenstore-based protocol to notify the domU and also to
> tranfer the overlay to it:
>
> https://urldefense.com/v3/__https://github.com/Xilinx/xen/blob/xlnx_rebase_4.16/docs/misc/arm/overlay.txt__;!!GF_29dbcQIUBPA!yTJGjmmfrdJfoc-magnbk6EcuNspXxuQyDMzItgaV9mehJMj37w7Go_O9tDCQV3Qpij0PORZdiZsjBYROsaRNWXs7xlouA$ [github[.]com]
> https://urldefense.com/v3/__https://github.com/Xilinx/xen/blob/xlnx_rebase_4.16/tools/helpers/get_overlay.c__;!!GF_29dbcQIUBPA!yTJGjmmfrdJfoc-magnbk6EcuNspXxuQyDMzItgaV9mehJMj37w7Go_O9tDCQV3Qpij0PORZdiZsjBYROsaRNWXoR3gEjw$ [github[.]com]
>
> There is a good description starting at slide 16 in the PDF.
>
>
> I am only sharing this as FYI.


Very interesting materials and impressive work! I can't even imagine
what it took to get it working.



> This Virtio problem is simpler because we
> already know the devices that are going to become available. We don't
> need an actual DT overlay to be passed to the domU. So we could get away
> with just a single interrupt or a single xenstore property.

I completely agree that for virtio it is going to be simpler than for FPGA.


>
>
>> Note that signal should only be sent if all backends which serve
>> virtio-pci devices within that PCI Host bridge are ready.
> Yes. That should be fine as long as all the backends are in the same
> domain. I can imagine there could be difficulties if the backends are
> in different domains: backend-domain-1 would have to tell dom0 that it
> is ready, then backend-domain-2 would have to do the same, then dom0
> finally notifies the domU, or something like that.
>
> Anyway, I think this is good enough to start as a solution. Excellent!
>
>
>>>>> Other ideas?
>>>> Another (crazy?) idea is to reuse CONFIG_XEN_VIRTIO_FORCE_GRANT for
>>>> dom0less system (I mean without "xen,grant-dma" bindings at all).
>>>> If virtio backends are always going to run in Dom0 when we have it up
>>>> and running, then it should work as domid == 0 is reserved for Dom0.
>>>> If there is a need to run virtio backends in other *backend* domain (for
>>>> the domain ID to be always known we could reserve an ID for it, so it
>>>> would be a const value),
>>>> we could probably introduce something configurable like
>>>> CONFIG_XEN_VIRTIO_FORCE_GRANT_BE_DOMID with 0 by default (or cmd line
>>>> option).
>>> The problem in a dom0less system is not much how to tell which is the
>>> backend domid, because that is known in advance and could be added to
>>> device tree at boot somehow. The issue is how to ask the frontend to
>>> "wait" and then how to tell the frontend to "proceed" after the backend
>>> comes online.
>> please see above.
>>
>>
>> To summarize:
>>
>> 1. For normal case there is no problem with communicating the backend
>> domid on Arm with device-tree (neither for virtio-mmio nor for virtio-pci),
>> for the virtio-pci the V2 (PCI-IOMMU bindings) should be used. For the
>> dom0less there won't be problem also as I understood from the discussion
>> (as the configuration is known in advance).
>> So I propose to concentrate on V2.
> Yes I agree
>
>
>> 2. The problem is in supporting virtio for the dom0less in general
>> despite whether it is a foreign or grant mappings.
>> Here we would need a (pseudo-)hotplug or some other method to start
>> operating only when backend is available.
> Yes I think you are right

--
Regards,

Oleksandr Tyshchenko