Re: PROBLEM: modprobe hang at startup (3.8.x, 3.9.x, IBM x3550)

From: Jean Delvare
Date: Wed May 15 2013 - 05:20:58 EST


Hi Robert,

Adding the linux-i2c list to Cc.

On Wed, 15 May 2013 09:16:26 +1000, Robert Norris wrote:
> On Mon, May 13, 2013 at 11:22:32AM +1000, Robert Norris wrote:
> > We have a number of Intel x3550 servers (Intel 5000-series). They've
> > been running 3.7.2 fine.
> >
> > In the last week I've run 3.8.11, 3.8.12 and 3.9.2 on them. All have
> > long hangs at boot, and later hung tasks in modprobe.
>
> I bisected this and tracked it to this commit:
>
> commit 6676a847d48ac48908cf467b42da9045b5463a6e
> Author: Jean Delvare <khali@xxxxxxxxxxxx>
> Date: Sun Dec 16 21:11:55 2012 +0100
>
> i2c-i801: Enable interrupts for all post-ICH5 chips
>
> I did not receive a single bug report after interrupt support was
> added for a limited number of chips. So I'd say the code is good and
> should be enabled for all supported chips, that is: ICH5 and later.
>
> Signed-off-by: Jean Delvare <khali@xxxxxxxxxxxx>
> Reviewed-by: Daniel Kurtz <djkurtz@xxxxxxxxxxxx>
>
> I've tested by building 3.9.2 with that single commit reverted, and it
> boots without issue.

Thanks a lot for reporting and even more for bisecting it, I know it
takes time. I apologize for the trouble. I suppose I should have been a
bit more cautious with the 63xxESB chips as they are a different family
of hardware.

> According to lspci I have:
>
> 00:1f.3 SMBus: Intel Corporation 631xESB/632xESB/3100 Chipset SMBus Controller (rev 09)
>
> Which has PCI ID 0x269b (ie PCI_DEVICE_ID_INTEL_ESB2_17).


Can you share the full output of lspci -s 00:1f.3 -vv?

I'm also curious if the SMBus controller shares its interrupt line with
another chip. /proc/interrupts should tell but you'll have to make one
of your systems hang again.

> For now I will either revert this commit in my kernel builds or
> blacklist the module on these machines (I haven't decided which I prefer
> yet).

You can also pass parameter disable_features=0x10 to the i2c-i801
driver, this will disable interrupt support without having to rebuild
the driver. I suppose this could be documented in more details in
modinfo, I'll work on that.

> Obviously, I can reproduce this reliably, and am happy to test.

Thanks for the offer. Right now I am stuck in bed and must take some
rest. When I feel better I'll see if I can gain access to systems with
Intel 63xxESB chips to try and reproduce the hang you're seeing. I'll
also take a look at the datasheets again to see if any difference stands
out.

For the time being I plan to simply disable interrupt support again for
the ESB chips, until we fully understand what happens on your systems.

As far as debugging goes, please tell me if you have any I2C/SMBus
slave device driver loaded (check in /sys/bus/i2c/drivers.) Loading the
i2c-i801 driver doesn't do much on its own if there are no slave device
drivers using it.

Thanks,
--
Jean Delvare
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/