Re: [PATCH v3 05/11] PCI: beef up pci_do_scan_bus()

From: Alex Chiang
Date: Wed Mar 18 2009 - 16:39:52 EST


* Kenji Kaneshige <kaneshige.kenji@xxxxxxxxxxxxxx>:
> Alex Chiang wrote:
>> The more I think about it though, the more I think that even
>> without the below patch to clean up the callers of
>> pci_do_scan_bus, we should be ok, because:
>>
>> - all the old code (which I removed below) existed
>> because the old PCI core would refuse to scan PCI buses
>> that had already been discovered
>>
>> - that meant that it would never descend past a known
>> bridge to try and find new child bridges
>>
>> - that meant that hotplug drivers had to manually
>> discover new bridges and add them, essentially
>> duplicating functionality in pci_scan_bridge
>>
>> This patch series allows the PCI core to scan existing bridges
>> and descend down into the children every time, looking for new
>> bridges and devices, so all the code in shpchp, cpcihp, and other
>> callers of pci_do_scan_bus shouldn't be necessary anymore.
>>
>> Also, if we do add new bridges once manually in shpchp, and then
>> call the new pci_do_scan_bus again, we will _not_ add devices
>> twice because the core should check each bridge and device for
>> struct pci_dev.is_added.
>>
>> So anyway, I think that cleaning up the callers of
>> pci_do_scan_bus is a good idea, but multiple calls to the
>> interface definitely should not result in problems. If they do,
>> then that's a bug in my patch series.
>>
>
> I'm sorry, but I didn't have enough time to try your patch on
> my environment. So I'm still just looking at the code.

Ok.

> I looked at shpchp_configure_device() from the view point of
> bridge hot-add. I think it is broken regardless of your change
> because it calls pci_bus_add_devices() (through pci_do_scan_bus)
> before assigning resources. So I think it must be changed
> regardless of your change. But it's a little difficult for me
> because I don't have any test environment as I mentioned before.

Hm, what you say makes sense.

I managed to find a very old machine supported by cpqphp, and
also found a card with a bridge.

cpqhp_configure_device() follows a similar algorithm to
shpchp_configure_device(). I'm just starting my testing now, and
there is good news and bad news.

The bad news is that although cpqphp loads successfully, and we
can successfully offline a card, we cannot online it again
afterwards due to BAR collisions. This failure occurs even
without my changes (2.6.27 kernel), and I haven't had time to
track the regression down yet.

We do discover the bridge on the device correctly and it is added
back into the device tree correctly, but we can't use it because
it's not programmed correctly.

The good news is, after rewriting cpqphp_configure_device() to
resemble the shpchp patch I gave you, we still discover the
bridge correctly and add it back into the device tree in the
proper place. We no longer get BAR collisions, but we fail in a
slightly different way.

At least I'm not introducing a new regression in cpqphp, and I
suspect shpchp will be similar.

> But I'm still worrying about your change against pci_do_scan_bus().
> Without your change, pci_do_scan_bus() scans child buses and add
> devices without assigning resources. I guess that it means existing
> callers of pci_do_scan_bus() have some mechanism to assign resource
> by theirselves and they don't expect pci_do_scan_bus() assigns
> resources.

I looked through shpchp and couldn't find this assumption. Is it
stored in the struct controller, under mmio_base and mmio_size?

I am motivated to get this patch series into 2.6.30 for several
reasons, so I think for now, I will not change pci_do_scan_bus().
Instead, I'll create a new interface that only the PCI core will
use, and leave the drivers alone.

Over time, we can migrate the drivers to the PCI core interface.

> By the way, I have one question about rescan. Please suppose that
> we enable the bridge(B) and its children using rescan interface
> in the picture below.
>
> |
> -------------------------------------- parent bus
> | |
> bridge(A) bridge(B)
> (working) (Not working)
> | |
> ------------- -------------
> | | | |
> dev dev dev dev
> (working) (working) (Not working)
>
> In this case, your rescan mechanism calls pci_do_scan_bus() for
> parent bus, and pci_do_scan_bus() calls pci_bus_assign_resources()
> for parent bus. My question is, does pci_bus_assign_resources() do
> nothing against bridge(A) that is currently working? I guess
> pci_bus_assign_resources() would update some registers of bridge(A)
> and it would breaks currently working devices.

This is a very good catch, thank you.

I added another patch to prevent this situation. We now check to
see if the bridge is already added inside of pci_setup_bridge().

Thanks.

/ac

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/