Re: [PATCH v3 00/14] PCI/P2PDMA: Support transactions that hit the host bridge
From: Bjorn Helgaas
Date: Fri Aug 16 2019 - 09:45:47 EST
On Mon, Aug 12, 2019 at 11:30:34AM -0600, Logan Gunthorpe wrote:
> Bjorn, this is v3 of the patchset. Christophs suggestion's required a
> bit more rework than could be expressed in incremental patches so
> I've resent the whole thing. I started with your p2pdma branch though
> so it should have all the changes you already made. I've included a
> range-diff from your p2pdma branch below.
I added Christoph's reviewed-by and replaced my pci/p2pdma branch with
this series, thanks! This is all headed for v5.4.
> A git branch is available here:
>
> https://github.com/sbates130272/linux-p2pmem/ p2pdma_rc_map_v3
>
> --
>
> Changes in v3:
> * Rebase on v5.3-rc3 (No changes)
> * Bjorn's edits for a bunch of the comments and commit messages
> * Rework upstream_bridge_distance() changes to split the distance,
> mapping type and ACS flag into separate pass-by-reference parameters
> (per Christoph's suggestion)
> * Jonathan Derrick (Intel) reported the domains can be 32-bits on
> some machines and thus the map_type_index() needed to be an unsigned long
> to accomadate this.
>
> Changes in v2:
> * Rebase on v5.3-rc2 (No changes)
> * Re-introduce the private pagemap structure and move the p2p-specific
> elements out of the commond dev_pagemap (per Christoph)
> * Use flags instead of bool in the whitelist (per Jason)
> * Only store the mapping type in the xarray (instead of the distance
> with flags) such that a function can return the mapping method
> with a switch statement to decide how to map. (per Christoph)
> * Drop find_parent_pci_dev() on the fast path and rely on the fact
> that the struct device passed to the mapping functions *must* be
> a PCI device and convert it directly. (per suggestions from
> Christoph and Jason)
> * Collected Christian's Reviewed-by's
>
> --
>
> As discussed on the list previously, in order to fully support the
> whitelist Christian added with the IOMMU, we must ensure that we
> map any buffer going through the IOMMU with an aprropriate dma_map
> call. This patchset accomplishes this by cleaning up the output of
> upstream_bridge_distance() to better indicate the mapping requirements,
> caching these requirements in an xarray, then looking them up at map
> time and applying the appropriate mapping method.
>
> After this patchset, it's possible to use the NVMe-of P2P support to
> transfer between devices without a switch on the whitelisted root
> complexes. A couple Intel device I have tested this on have also
> been added to the white list.
>
> Most of the changes are contained within the p2pdma.c, but there are
> a few minor touches to other subsystems, mostly to add support
> to call an unmap function.
>
> --
>
> Logan Gunthorpe (14):
> PCI/P2PDMA: Introduce private pagemap structure
> PCI/P2PDMA: Add provider's pci_dev to pci_p2pdma_pagemap struct
> PCI/P2PDMA: Add constants for map type results to
> upstream_bridge_distance()
> PCI/P2PDMA: Factor out __upstream_bridge_distance()
> PCI/P2PDMA: Apply host bridge whitelist for ACS
> PCI/P2PDMA: Factor out host_bridge_whitelist()
> PCI/P2PDMA: Whitelist some Intel host bridges
> PCI/P2PDMA: Add attrs argument to pci_p2pdma_map_sg()
> PCI/P2PDMA: Introduce pci_p2pdma_unmap_sg()
> PCI/P2PDMA: Factor out __pci_p2pdma_map_sg()
> PCI/P2PDMA: Store mapping method in an xarray
> PCI/P2PDMA: dma_map() requests that traverse the host bridge
> PCI/P2PDMA: Allow IOMMU for host bridge whitelist
> PCI/P2PDMA: Update pci_p2pdma_distance_many() documentation
>
> drivers/infiniband/core/rw.c | 6 +-
> drivers/nvme/host/pci.c | 10 +-
> drivers/pci/p2pdma.c | 374 +++++++++++++++++++++++++----------
> include/linux/memremap.h | 1 -
> include/linux/pci-p2pdma.h | 28 ++-
> 5 files changed, 300 insertions(+), 119 deletions(-)
>
> --
>
> $ git range-diff bjorn/master..bjorn/pci/p2pdma master..p2pdma_rc_map_v3
>
> 1: 5173c82b3571 = 1: 78cc3a5be538 PCI/P2PDMA: Introduce private pagemap structure
> 2: f33fd6b8da93 = 2: 9fb388dff957 PCI/P2PDMA: Add provider's pci_dev to pci_p2pdma_pagemap struct
> 3: 767f47b59702 < -: ------------ PCI/P2PDMA: Add constants for not-supported result upstream_bridge_distance()
> -: ------------ > 3: ed105432f644 PCI/P2PDMA: Add constants for map type results to upstream_bridge_distance()
> 4: 93ed41974d69 ! 4: 590aaea31997 PCI/P2PDMA: Factor out __upstream_bridge_distance()
> @@ -17,8 +17,8 @@
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@
> - P2PDMA_NOT_SUPPORTED = 0x08000000,
> - };
> + return false;
> + }
>
> -/*
> - * Find the distance through the nearest common upstream bridge between
> @@ -44,31 +44,29 @@
> - * port of the switch, to the common upstream port, back up to the second
> - * downstream port and then to Device B.
> - *
> -- * Any two devices that cannot communicate using p2pdma will return the
> -- * distance with the flag P2PDMA_NOT_SUPPORTED.
> +- * Any two devices that cannot communicate using p2pdma will return
> +- * PCI_P2PDMA_MAP_NOT_SUPPORTED.
> - *
> - * Any two devices that have a data path that goes through the host bridge
> - * will consult a whitelist. If the host bridges are on the whitelist,
> -- * the distance will be returned with the flag P2PDMA_THRU_HOST_BRIDGE set.
> -- * If either bridge is not on the whitelist, the flag P2PDMA_NOT_SUPPORTED will
> -- * be set.
> +- * this function will return PCI_P2PDMA_MAP_THRU_HOST_BRIDGE.
> +- *
> +- * If either bridge is not on the whitelist this function returns
> +- * PCI_P2PDMA_MAP_NOT_SUPPORTED.
> - *
> -- * If a bridge which has any ACS redirection bits set is in the path
> -- * this function will flag the result with P2PDMA_ACS_FORCES_UPSTREAM.
> -- * In this case, a list of all infringing bridge addresses will be
> -- * populated in acs_list (assuming it's non-null) for printk purposes.
> +- * If a bridge which has any ACS redirection bits set is in the path,
> +- * acs_redirects will be set to true. In this case, a list of all infringing
> +- * bridge addresses will be populated in acs_list (assuming it's non-null)
> +- * for printk purposes.
> - */
> --static int upstream_bridge_distance(struct pci_dev *provider,
> -- struct pci_dev *client,
> -- struct seq_buf *acs_list)
> -+static int __upstream_bridge_distance(struct pci_dev *provider,
> -+ struct pci_dev *client,
> -+ struct seq_buf *acs_list)
> + static enum pci_p2pdma_map_type
> +-upstream_bridge_distance(struct pci_dev *provider, struct pci_dev *client,
> ++__upstream_bridge_distance(struct pci_dev *provider, struct pci_dev *client,
> + int *dist, bool *acs_redirects, struct seq_buf *acs_list)
> {
> struct pci_dev *a = provider, *b = client, *bb;
> - int dist_a = 0;
> @@
> - return dist_a + dist_b;
> + return PCI_P2PDMA_MAP_BUS_ADDR;
> }
>
> +/*
> @@ -95,27 +93,29 @@
> + * port of the switch, to the common upstream port, back up to the second
> + * downstream port and then to Device B.
> + *
> -+ * Any two devices that cannot communicate using p2pdma will return the
> -+ * distance with the flag P2PDMA_NOT_SUPPORTED.
> ++ * Any two devices that cannot communicate using p2pdma will return
> ++ * PCI_P2PDMA_MAP_NOT_SUPPORTED.
> + *
> + * Any two devices that have a data path that goes through the host bridge
> + * will consult a whitelist. If the host bridges are on the whitelist,
> -+ * the distance will be returned with the flag P2PDMA_THRU_HOST_BRIDGE set.
> -+ * If either bridge is not on the whitelist, the flag P2PDMA_NOT_SUPPORTED will
> -+ * be set.
> ++ * this function will return PCI_P2PDMA_MAP_THRU_HOST_BRIDGE.
> + *
> -+ * If a bridge which has any ACS redirection bits set is in the path
> -+ * this function will flag the result with P2PDMA_ACS_FORCES_UPSTREAM.
> -+ * In this case, a list of all infringing bridge addresses will be
> -+ * populated in acs_list (assuming it's non-null) for printk purposes.
> ++ * If either bridge is not on the whitelist this function returns
> ++ * PCI_P2PDMA_MAP_NOT_SUPPORTED.
> ++ *
> ++ * If a bridge which has any ACS redirection bits set is in the path,
> ++ * acs_redirects will be set to true. In this case, a list of all infringing
> ++ * bridge addresses will be populated in acs_list (assuming it's non-null)
> ++ * for printk purposes.
> + */
> -+static int upstream_bridge_distance(struct pci_dev *provider,
> -+ struct pci_dev *client,
> -+ struct seq_buf *acs_list)
> ++static enum pci_p2pdma_map_type
> ++upstream_bridge_distance(struct pci_dev *provider, struct pci_dev *client,
> ++ int *dist, bool *acs_redirects, struct seq_buf *acs_list)
> +{
> -+ return __upstream_bridge_distance(provider, client, acs_list);
> ++ return __upstream_bridge_distance(provider, client, dist,
> ++ acs_redirects, acs_list);
> +}
> +
> - static int upstream_bridge_distance_warn(struct pci_dev *provider,
> - struct pci_dev *client)
> - {
> + static enum pci_p2pdma_map_type
> + upstream_bridge_distance_warn(struct pci_dev *provider, struct pci_dev *client,
> + int *dist)
> 5: f975e51cf5df < -: ------------ PCI/P2PDMA: Apply host bridge whitelist for ACS
> -: ------------ > 5: 336e968f075b PCI/P2PDMA: Apply host bridge whitelist for ACS
> 6: 59b6507ac07c ! 6: 3f565ce6e1d4 PCI/P2PDMA: Factor out host_bridge_whitelist()
> @@ -58,15 +58,16 @@
> + return false;
> +}
> +
> - enum {
> - /*
> - * These arbitrary offset are or'd onto the upstream distance
> + static enum pci_p2pdma_map_type
> + __upstream_bridge_distance(struct pci_dev *provider, struct pci_dev *client,
> + int *dist, bool *acs_redirects, struct seq_buf *acs_list)
> @@
> - if (!(dist & P2PDMA_THRU_HOST_BRIDGE))
> - return dist;
> + acs_redirects, acs_list);
>
> -- if (root_complex_whitelist(provider) && root_complex_whitelist(client))
> -+ if (host_bridge_whitelist(provider, client))
> - return dist;
> + if (map_type == PCI_P2PDMA_MAP_THRU_HOST_BRIDGE) {
> +- if (!root_complex_whitelist(provider) ||
> +- !root_complex_whitelist(client))
> ++ if (!host_bridge_whitelist(provider, client))
> + return PCI_P2PDMA_MAP_NOT_SUPPORTED;
> + }
>
> - return dist | P2PDMA_NOT_SUPPORTED;
> 7: 5ecec567445f = 7: 7b8701dd45e0 PCI/P2PDMA: Whitelist some Intel host bridges
> 8: 68a37758d5cf = 8: b52771ae7f95 PCI/P2PDMA: Add attrs argument to pci_p2pdma_map_sg()
> 9: 4b821298d4f7 = 9: 88a554eb39c9 PCI/P2PDMA: Introduce pci_p2pdma_unmap_sg()
> 10: 3f2dac803737 = 10: 604260bd186a PCI/P2PDMA: Factor out __pci_p2pdma_map_sg()
> 11: fc402d621534 ! 11: b55fc423a41a PCI/P2PDMA: Store mapping method in an xarray
> @@ -18,14 +18,10 @@
> #include <linux/seq_buf.h>
> #include <linux/iommu.h>
> +#include <linux/xarray.h>
> -+
> -+enum pci_p2pdma_map_type {
> -+ PCI_P2PDMA_MAP_UNKNOWN = 0,
> -+ PCI_P2PDMA_MAP_NOT_SUPPORTED,
> -+ PCI_P2PDMA_MAP_BUS_ADDR,
> -+ PCI_P2PDMA_MAP_THRU_IOMMU,
> -+};
>
> + enum pci_p2pdma_map_type {
> + PCI_P2PDMA_MAP_UNKNOWN = 0,
> +@@
> struct pci_p2pdma {
> struct gen_pool *pool;
> bool p2pmem_published;
> @@ -51,10 +47,10 @@
> if (!p2p->pool)
> goto out;
> @@
> - return dist_a + dist_b;
> + return PCI_P2PDMA_MAP_BUS_ADDR;
> }
>
> -+static int map_types_idx(struct pci_dev *client)
> ++static unsigned long map_types_idx(struct pci_dev *client)
> +{
> + return (pci_domain_nr(client->bus) << 16) |
> + (client->bus->number << 8) | client->devfn;
> @@ -64,37 +60,17 @@
> * Find the distance through the nearest common upstream bridge between
> * two PCI devices.
> @@
> - struct pci_dev *client,
> - struct seq_buf *acs_list)
> - {
> -+ enum pci_p2pdma_map_type map_type;
> - int dist;
> -
> - dist = __upstream_bridge_distance(provider, client, acs_list);
>
> -- if (!(dist & P2PDMA_THRU_HOST_BRIDGE))
> -- return dist;
> -+ if (!(dist & P2PDMA_THRU_HOST_BRIDGE)) {
> -+ map_type = PCI_P2PDMA_MAP_BUS_ADDR;
> -+ goto store_map_type_and_return;
> -+ }
> -+
> -+ if (host_bridge_whitelist(provider, client)) {
> -+ map_type = PCI_P2PDMA_MAP_THRU_IOMMU;
> -+ } else {
> -+ dist |= P2PDMA_NOT_SUPPORTED;
> -+ map_type = PCI_P2PDMA_MAP_NOT_SUPPORTED;
> -+ }
> + if (map_type == PCI_P2PDMA_MAP_THRU_HOST_BRIDGE) {
> + if (!host_bridge_whitelist(provider, client))
> +- return PCI_P2PDMA_MAP_NOT_SUPPORTED;
> ++ map_type = PCI_P2PDMA_MAP_NOT_SUPPORTED;
> + }
>
> -- if (host_bridge_whitelist(provider, client))
> -- return dist;
> -+store_map_type_and_return:
> + if (provider->p2pdma)
> + xa_store(&provider->p2pdma->map_types, map_types_idx(client),
> + xa_mk_value(map_type), GFP_KERNEL);
> -
> -- return dist | P2PDMA_NOT_SUPPORTED;
> -+ return dist;
> ++
> + return map_type;
> }
>
> - static int upstream_bridge_distance_warn(struct pci_dev *provider,
> 12: c51eb851e9da ! 12: 94e4c8633459 PCI/P2PDMA: dma_map() requests that traverse the host bridge
> @@ -44,7 +44,7 @@
> + client = to_pci_dev(dev);
> +
> + switch (pci_p2pdma_map_type(p2p_pgmap->provider, client)) {
> -+ case PCI_P2PDMA_MAP_THRU_IOMMU:
> ++ case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
> + return dma_map_sg_attrs(dev, sg, nents, dir, attrs);
> + case PCI_P2PDMA_MAP_BUS_ADDR:
> + return __pci_p2pdma_map_sg(p2p_pgmap, dev, sg, nents);
> @@ -71,7 +71,7 @@
> +
> + map_type = pci_p2pdma_map_type(p2p_pgmap->provider, client);
> +
> -+ if (map_type == PCI_P2PDMA_MAP_THRU_IOMMU)
> ++ if (map_type == PCI_P2PDMA_MAP_THRU_HOST_BRIDGE)
> + dma_unmap_sg_attrs(dev, sg, nents, dir, attrs);
> }
> EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs);
> 13: 0a3468f51621 = 13: f067cdb5b963 PCI/P2PDMA: Allow IOMMU for host bridge whitelist
> 14: 20c0cf9f4c9c = 14: 87c84b001dfa PCI/P2PDMA: Update pci_p2pdma_distance_many() documentation
>
> --
> 2.20.1