Re: [PATCH v3 8/8] PCI: Relax bridge window tail sizing rules

From: Andy Shevchenko
Date: Tue May 07 2024 - 09:52:22 EST


On Tue, May 07, 2024 at 01:25:23PM +0300, Ilpo Järvinen wrote:
> During remove & rescan cycle, PCI subsystem will recalculate and adjust
> the bridge window sizing that was initially done by "BIOS". The size
> calculation is based on the required alignment of the largest resource
> among the downstream resources as per pbus_size_mem() (unimportant or
> zero parameters marked with "..."):
>
> min_align = calculate_mem_align(aligns, max_order);
> size0 = calculate_memsize(size, ..., min_align);
>
> inside calculate_memsize(), for the largest alignment:
> min_align = align1 >> 1;
> ...
> return min_align;
>
> and then in calculate_memsize():
> return ALIGN(max(size, ...), align);
>
> If the original bridge window sizing tried to conserve space, this will
> lead to massive increase of the required bridge window size when the
> downstream has a large disparity in BAR sizes. E.g., with 16MiB and
> 16GiB BARs this results in 24GiB bridge window size even if 16MiB BAR
> does not require gigabytes of space to fit.
>
> When doing remove & rescan for a bus that contains such a PCI device, a
> larger bridge window is suddenly required on rescan but when there is a
> bridge window upstream that is already assigned based on the original
> size, it cannot be enlarged to the new requirement. This causes the
> allocation of the bridge window to fail (0x600000000 > 0x400ffffff):
>
> pci 0000:02:01.0: PCI bridge to [bus 03]
> pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff]
> pci 0000:02:01.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]
> pci 0000:01:00.0: PCI bridge to [bus 02-04]
> pci 0000:01:00.0: bridge window [mem 0x40400000-0x406fffff]
> pci 0000:01:00.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]
>
> pci 0000:03:00.0: device released
> pci 0000:02:01.0: device released
> pcieport 0000:01:00.0: scanning [bus 02-04] behind bridge, pass 0
> pci 0000:02:01.0: PCI bridge to [bus 03]
> pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff]
> pci 0000:02:01.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]
> pci 0000:02:01.0: scanning [bus 03-03] behind bridge, pass 0
> pci 0000:03:00.0: BAR 0 [mem 0x6400000000-0x6400ffffff 64bit pref]
> pci 0000:03:00.0: BAR 2 [mem 0x6000000000-0x63ffffffff 64bit pref]
> pci 0000:03:00.0: ROM [mem 0x40400000-0x405fffff pref]
>
> pci 0000:02:01.0: PCI bridge to [bus 03]
> pci 0000:02:01.0: scanning [bus 03-03] behind bridge, pass 1
> pcieport 0000:01:00.0: scanning [bus 02-04] behind bridge, pass 1
> pci 0000:02:01.0: bridge window [mem size 0x600000000 64bit pref]: can't assign; no space
> pci 0000:02:01.0: bridge window [mem size 0x600000000 64bit pref]: failed to assign
> pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff]: assigned
> pci 0000:03:00.0: BAR 2 [mem size 0x400000000 64bit pref]: can't assign; no space
> pci 0000:03:00.0: BAR 2 [mem size 0x400000000 64bit pref]: failed to assign
> pci 0000:03:00.0: BAR 0 [mem size 0x01000000 64bit pref]: can't assign; no space
> pci 0000:03:00.0: BAR 0 [mem size 0x01000000 64bit pref]: failed to assign
> pci 0000:03:00.0: ROM [mem 0x40400000-0x405fffff pref]: assigned
> pci 0000:02:01.0: PCI bridge to [bus 03]
> pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff]
>
> This is a major surprise for users who are suddenly left with a PCIe
> device that was working fine with the original bridge window sizing.
>
> Even if the already assigned bridge window could be enlarged by
> reallocation in some cases (something the current code does not attempt
> to do), it is not possible in general case and the large amount of
> wasted space at the tail of the bridge window may lead to other
> resource exhaustion problems on Root Complex level (think of multiple
> PCIe cards with VFs and BAR size disparity in a single system).
>
> PCI specifications only expect natural alignment for BARs (PCI Express
> Base Specification, rev. 6.1 sect. 7.5.1.2.1) and minimum of 1MiB
> alignment for the bridge window (PCI Express Base Specification,
> rev 6.1 sect. 7.5.1.3). The current bridge window tail alignment rule
> was introduced in the commit 5d0a8965aea9 ("[PATCH] 2.5.14: New PCI
> allocation code (alpha, arm, parisc) [2/2]") that only states:
> "pbus_size_mem: core stuff; tested with randomly generated sets of
> resources". It does not explain the motivation for the extra tail space
> allocated that is not truly needed by the downstream resources. As
> such, it is far from clear if it ever has been required by any HW.
>
> To prevent PCIe cards with BAR size disparity from becoming unusable
> after remove & rescan cycle, attempt to do a truly minimal allocation
> for memory resources if needed. First check if the normally calculated
> bridge window will not fit into an already assigned upstream resource.
> In such case, try with relaxed bridge window tail sizing rules instead
> where no extra tail space is requested beyond what the downstream
> resources require. Only enforce the alignment requirement of the bridge
> window itself (normally 1MiB).
>
> With this patch, the resources are successfully allocated:
>
> pci 0000:02:01.0: PCI bridge to [bus 03]
> pci 0000:02:01.0: scanning [bus 03-03] behind bridge, pass 1
> pcieport 0000:01:00.0: scanning [bus 02-04] behind bridge, pass 1
> pcieport 0000:01:00.0: Assigned bridge window [mem 0x6000000000-0x6400ffffff 64bit pref] to [bus 02-04] cannot fit 0x600000000 required for 0000:02:01.0 bridging to [bus 03]
> pci 0000:02:01.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref] to [bus 03] requires relaxed alignment rules
> pcieport 0000:01:00.0: Assigned bridge window [mem 0x40400000-0x406fffff] to [bus 02-04] free space at [mem 0x40400000-0x405fffff]
> pci 0000:02:01.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]: assigned
> pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff]: assigned
> pci 0000:03:00.0: BAR 2 [mem 0x6000000000-0x63ffffffff 64bit pref]: assigned
> pci 0000:03:00.0: BAR 0 [mem 0x6400000000-0x6400ffffff 64bit pref]: assigned
> pci 0000:03:00.0: ROM [mem 0x40400000-0x405fffff pref]: assigned
> pci 0000:02:01.0: PCI bridge to [bus 03]
> pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff]
> pci 0000:02:01.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]
>
> This patch draws inspiration from the initial investigations and work
> by Mika Westerberg.

..

> + min_align = 1ULL << (max_order + __ffs(SZ_1M));

In case of a new version of the series, this can utilise BIT_ULL().

--
With Best Regards,
Andy Shevchenko