Re: [v1 1/2] xen/p2m: Create identity mappings for PFNs beyound E820and PCI BARs

From: Konrad Rzeszutek Wilk
Date: Fri Oct 25 2013 - 12:38:50 EST


On Fri, Oct 25, 2013 at 11:03:20AM -0400, Konrad Rzeszutek Wilk wrote:
> On bootup the E820 "gaps" or E820_RESV regions are marked as
> identity regions. Meaning that any lookup done in the P2M
> will return the same value: pfn_to_mfn(pfn) == pfn.
>
> This is needed for PCI devices so that drivers can reference
> the correct bus address. Unfortunatly there are also PCIe
> devices which setup their MMIO region above the E820. By default
> we assume in the P2M that any region above the last E820 entry
> is to be used for ballooning. That is not true - and we don't
> mark such regions as IDENTIY_FRAME but INVALID_P2M_ENTRY.
> The result is that any lookup in the P2M (pfn_to_mfn(pfn) == 0)
> gets us the 0 bus address which is hardly correct.
>
> A solution that this patch implements is to walk the PCI device
> BAR regions and mark them as IDENTITY_FRAMEs in the P2M.
> Naturally some checks are needed such as making sure that the
> BAR regions are not part of the balloon pages, nor regular RAM.
>
> We only set them to IDENTITY if the:
> pfn_to_mfn(pfn) == INVALID_P2M_ENTRY.
>
> Another solution would be to mark all P2M entries beyond the
> last E820 entry _and_ not in the balloon regions as IDENTITY.
>
> If done, that means in worst case we have to reserve MAX_DOMAIN_PAGES
> pages (so 2MB) of virtual space in case we have to create
> new P2M leafs. We could be fancy and make the P2M code understand
> p2m_mid_missing and p2_mid_identity and do the right things.
> But that is quite complex while this particular patch only
> gets invoked if there are PCI devices. Another solution (David
> Vrabel ideas) was to consider INVALID_P2M_ENTRY as 1-1 regions.
> The author of this patch is not sure of the ramifications
> as it would require surgery in various P2M codebits.
>
> Reported-by: Lukas Hejtmanek <xhejtman@xxxxxxxxxxx>
> Reported-by: Lance Larsh <lance.larsh@xxxxxxxxxx>
> CC: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
> CC: David Vrabel <david.vrabel@xxxxxxxxxx>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> ---
> drivers/xen/balloon.c | 19 +++++++++++++
> drivers/xen/pci.c | 79 +++++++++++++++++++++++++++++++++++++++++++++++++--
> include/xen/balloon.h | 1 +
> 3 files changed, 96 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index b232908..258e3f9 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -133,6 +133,25 @@ static void balloon_append(struct page *page)
> adjust_managed_page_count(page, -1);
> }
>
> +/*
> + * Check if any the balloon pages overlap with the supplied
> + * pfn and its range.
> + */
> +bool balloon_pfn(unsigned long pfn, unsigned long nr)
> +{
> + struct page *page;
> +
> + if (list_empty(&ballooned_pages))
> + return false;
> +
> + list_for_each_entry(page, &ballooned_pages, lru) {
> + unsigned long b_pfn = page_to_pfn(page);
> +
> + if (b_pfn >= pfn && b_pfn < pfn + nr)
> + return true;
> + }
> + return false;
> +}
> /* balloon_retrieve: rescue a page from the balloon, if it is not empty. */
> static struct page *balloon_retrieve(bool prefer_highmem)
> {
> diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
> index 18fff88..6b86eda 100644
> --- a/drivers/xen/pci.c
> +++ b/drivers/xen/pci.c
> @@ -22,6 +22,9 @@
> #include <xen/xen.h>
> #include <xen/interface/physdev.h>
> #include <xen/interface/xen.h>
> +#include <xen/interface/memory.h>
> +#include <xen/page.h>
> +#include <xen/balloon.h>
>
> #include <asm/xen/hypervisor.h>
> #include <asm/xen/hypercall.h>
> @@ -127,6 +130,72 @@ static int xen_add_device(struct device *dev)
> return r;
> }
>
> +static void xen_p2m_add_device(struct device *dev)
> +{
> + int i;
> + struct pci_dev *pci_dev = to_pci_dev(dev);
> +
> + /* Verify whether the MMIO BARs are 1-1 in the P2M. */
> + for (i = 0; i < PCI_NUM_RESOURCES; i++) {
> + unsigned long pfn, start, end, ok_pfns;
> + char bus_addr[64];
> + char *fmt;
> +
> + if (!pci_resource_len(pci_dev, i))
> + continue;
> +
> + if (pci_resource_flags(pci_dev, i) == IORESOURCE_IO)
> + fmt = " (bus address [%#06llx-%#06llx])";
> + else
> + fmt = " (bus address [%#010llx-%#010llx])";
> +
> + snprintf(bus_addr, sizeof(bus_addr), fmt,
> + (unsigned long long) (pci_resource_start(pci_dev, i)),
> + (unsigned long long) (pci_resource_end(pci_dev, i)));
> +
> + start = pci_resource_start(pci_dev, i) >> PAGE_SHIFT;
> + end = pci_resource_end(pci_dev, i) >> PAGE_SHIFT;
> +
> + /*
> + * We don't worry about the balloon scratch page as it has a
> + * valid PFN - which means we will catch in the loop below.
> + */
> + if (balloon_pfn(start, end - start)) {
> + dev_warn(dev, "%s is within balloon pages!\n", bus_addr);
> + continue;
> + }
> +
> + for (ok_pfns = 0, pfn = start; pfn < end; pfn++) {
> + unsigned long mfn = pfn_to_mfn(pfn);
> +
> + if (mfn == pfn) {
> + ok_pfns++;
> + continue;
> + }
> + if (mfn != INVALID_P2M_ENTRY) { /* RAM */
> + dev_warn(dev, "%s is within RAM [%lx] region!\n", bus_addr, pfn);
> + break;
> + }
> + }
> + if (ok_pfns == end - start) /* All good. */
> + continue;
> +
> + if (pfn != end - 1) /* We broke out of the loop above. */
> + continue;

There are some bugs in this code so please wait until the next posting
which hopefully will also include me testing it on affected hardware.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/