Re: Issue with RTL8111 NIC after upgrade to kernel 4.19

From: Heiner Kallweit
Date: Thu Nov 22 2018 - 14:48:56 EST


On 22.11.2018 19:57, Andrew Lunn wrote:
>> Thanks a lot for testing. Could you please test also the following
>> as an alternative to the delay?
>>
>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>> index 55202a0ac..aeccb2323 100644
>> --- a/drivers/net/phy/phy_device.c
>> +++ b/drivers/net/phy/phy_device.c
>> @@ -2254,6 +2254,7 @@ int phy_driver_register(struct phy_driver *new_driver, struct module *owner)
>> new_driver->mdiodrv.driver.probe = phy_probe;
>> new_driver->mdiodrv.driver.remove = phy_remove;
>> new_driver->mdiodrv.driver.owner = owner;
>> + new_driver->mdiodrv.driver.probe_type = PROBE_FORCE_SYNCHRONOUS;
>>
>> retval = driver_register(&new_driver->mdiodrv.driver);
>> if (retval) {
>
>
> Humm, maybe i don't understand the issue correctly.
>
> When the MDIO bus is registered, we scan the bus looking for PHYs.
> When we find a PHY, we call phy_device_create(). That will then
> trigger the loading of the kernel module which should driver this phy.
>
> Sometime later, the PHY driver module gets loaded and calls
> phy_drivers_register() to register the list of IDs it supports. The
> driver core will then call phy_bus_match() to see if the newly loaded
> driver matches to a device we have created. If so, the driver will be
> associated to the device.
>
request_module() is synchronous and should return only once the init
function of the loaded module has been executed. Means in case of
phylib: all PHY drivers of the module have been registered

> Sometime later, the MAC tries to attach to the phy using
> phy_attach_direct(). If there is no driver associated to the device,
> we use the generic PHY driver.
>
> I thought the issue was the race condition between loading the module
> and the MAC attaching to it? We are getting the generic driver because
> the specific driver is still loading?
>
No. The issue happens before: For whatever reason the PHY driver doesn't
bind to the PHY device, even though they match. Therefore
phy_attach_direct() then binds the genphy driver.

In theory it shouldn't make a difference whether device or driver is
registered first. When a driver is registered it checks all unbound
devices on the same bus for a match.

Standard is asynchronous driver probing, therefore in case of phylib
(in at least a lot of cases) the driver probes the devices on the bus
once the PHY device was registered. This seems to fail and I have no
idea why.

It has been reported that loading the PHY driver module upfront fixes
the issue. Therefore I assumed it may help to let the PHY driver probe
for devices earlier (even though the device isn't even registered yet).
And indeed (see mail just sent by Marc) using synchronous probing
fixes the issue.
It seems to me that probing triggered from device registering and
probing triggered from driver registering somehow interfere.

Of course it's not very satisfying to have a fix but to not understand
the root cause of the issue. I add Greg and Rafael as maintainers of
the driver core to the discussion. Maybe they can shed some light on
the situation.

> If that really is the issue, i think phy_attach_direct() should try
> loading the module again, doing it synchronously. Only when it fails
> should the generic driver be associated to the device.
>
> Andrew
>
>
Heiner