RE: [PATCH] PCI: Add a mutex to protect the global list pci_domain_busn_res_list

From: Haiyang Zhang
Date: Fri Apr 19 2024 - 11:08:01 EST




> -----Original Message-----
> From: Dexuan Cui <decui@xxxxxxxxxxxxx>
> Sent: Thursday, April 18, 2024 9:53 PM
> To: bhelgaas@xxxxxxxxxx; wei.liu@xxxxxxxxxx; KY Srinivasan
> <kys@xxxxxxxxxxxxx>; Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>;
> lpieralisi@xxxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx
> Cc: linux-hyperv@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Boqun
> Feng <Boqun.Feng@xxxxxxxxxxxxx>; Sunil Muthuswamy
> <sunilmut@xxxxxxxxxxxxx>; Saurabh Singh Sengar <ssengar@xxxxxxxxxxxxx>;
> Dexuan Cui <decui@xxxxxxxxxxxxx>
> Subject: [PATCH] PCI: Add a mutex to protect the global list
> pci_domain_busn_res_list
>
> There has been an effort to make the pci-hyperv driver support
> async-probing to reduce the boot time. With async-probing, multiple
> kernel threads can be running hv_pci_probe() -> create_root_hv_pci_bus()
> ->
> pci_scan_root_bus_bridge() -> pci_bus_insert_busn_res() at the same time
> to
> update the global list, causing list corruption.
>
> Add a mutex to protect the list.
>
> Signed-off-by: Dexuan Cui <decui@xxxxxxxxxxxxx>
> ---
> drivers/pci/probe.c | 25 ++++++++++++++++++-------
> 1 file changed, 18 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index e19b79821dd6..1327fd820b24 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -37,6 +37,7 @@ LIST_HEAD(pci_root_buses);
> EXPORT_SYMBOL(pci_root_buses);
>
> static LIST_HEAD(pci_domain_busn_res_list);
> +static DEFINE_MUTEX(pci_domain_busn_res_list_lock);
>
> struct pci_domain_busn_res {
> struct list_head list;
> @@ -47,14 +48,22 @@ struct pci_domain_busn_res {
> static struct resource *get_pci_domain_busn_res(int domain_nr)
> {
> struct pci_domain_busn_res *r;
> + struct resource *ret;
>
> - list_for_each_entry(r, &pci_domain_busn_res_list, list)
> - if (r->domain_nr == domain_nr)
> - return &r->res;
> + mutex_lock(&pci_domain_busn_res_list_lock);
> +
> + list_for_each_entry(r, &pci_domain_busn_res_list, list) {
> + if (r->domain_nr == domain_nr) {
> + ret = &r->res;
> + goto out;
> + }
> + }
>
> r = kzalloc(sizeof(*r), GFP_KERNEL);
> - if (!r)
> - return NULL;
> + if (!r) {
> + ret = NULL;
> + goto out;
> + }
>
> r->domain_nr = domain_nr;
> r->res.start = 0;
> @@ -62,8 +71,10 @@ static struct resource *get_pci_domain_busn_res(int
> domain_nr)
> r->res.flags = IORESOURCE_BUS | IORESOURCE_PCI_FIXED;
>
> list_add_tail(&r->list, &pci_domain_busn_res_list);
> -
> - return &r->res;
> + ret = &r->res;
> +out:
> + mutex_unlock(&pci_domain_busn_res_list_lock);
> + return ret;
> }

The patch is for common pci code. So, this bug has been there for a while?
Do you have a sample stack trace of the crash?

I checked pci-hyperv, it doesn't define the .driver.probe_type, so
PROBE_DEFAULT_STRATEGY is in effect. driver_allows_async_probing() returns
false unless kernel/mod param requests async. So async probing haven't
been practiced here.

If in the future, we change the pci-hyperv's probe_type to PROBE_PREFER_ASYNCHRONOUS,
how does it affect the underlying PCI device's probes within the same
device type?
For example, MANA driver doesn't set probe_type. Will pci-hyperv's async
probing cause async probing or potentially nondeterministic naming for
MANA devices?

Thanks,
- Haiyang