Re: [REGRESION] Suspend hangs with 3.6-rc1 on Lenovo T60 notebook

From: Hans de Goede
Date: Thu Aug 16 2012 - 07:34:00 EST


This is a multi-part message in MIME format.Hi,

On 08/15/2012 09:59 PM, Rafael J. Wysocki wrote:
On Wednesday, August 15, 2012, Hans de Goede wrote:
Hi,

On 08/15/2012 07:13 AM, Miklos Szeredi wrote:
Suspend oopses in generic_ide_suspend() because dev_get_drvdata()
returns NULL (dev->p->driver_data == NULL) and this function is not
prepared for this.

I bisected it to 0998d063 (device-core: Ensure drvdata = NULL when no
driver is bound). Reverting it fixes suspend.


First of all, thanks for reporting and bisecting this. With that said,
I must say that this is very weird. The patch in question:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=0998d063

Only makes dev-drvdata NULL in 2 cases:
1) The probe method of the driver fails
2) The driver has been detached from the device by calling one of:
device_release_driver() or driver_detach()

Note that in both code paths dev->driver also gets set to NULL, and
other generic ide driver callbacks very much depend on that not being
NULL, ie:

static int generic_ide_remove(struct device *dev)
{
ide_drive_t *drive = to_ide_device(dev);
struct ide_driver *drv = to_ide_driver(dev->driver);

if (drv->remove)
drv->remove(drive);

return 0;
}

Also how can a drivers suspend callback get called if dev->driver is NULL,
since that callback would normally be "reached" through dev->driver, so
something weird is going on here ...

No, it wouldn't, because it is a bus type callback and it is invoked for
all devices whose bus type is ide_bus_type, regardless of whether or not
their driver field is NULL.

Ah right, these are bus_driver operations. That explains some things, so I've
done some more research asking myself: "Why does generic_ide_suspend(), which
is a *bus* op, call dev_get_drvdata?", the answer to that seems to be that
the ide subsystem is abusing (IMHO) drvdata to store per device bus_driver
data. Which I believe is not how drvdata is intended to be used.

With that said, the above knowledge has allowed me to write an (ugly) fix for
the regression Miklos is seeing. Miklos can you give the attached patch a
try please?

It clearly should check if drive is not NULL before using that pointer.

I assume you mean drive*r*, yes I agree that generic_ide_remove should
check for that. So who is going to write a patch for that?

Regards,

Hans