Re: PROBLEM: modprobe hang at startup (3.8.x, 3.9.x, IBM x3550)

From: Robert Norris
Date: Wed May 15 2013 - 07:27:55 EST


Hi Jean,

On Wed, May 15, 2013 at 11:20:44AM +0200, Jean Delvare wrote:
> Thanks a lot for reporting and even more for bisecting it, I know it
> takes time. I apologize for the trouble. I suppose I should have been
> a bit more cautious with the 63xxESB chips as they are a different
> family of hardware.

No problem! It was kind of fun actually ;)

> Can you share the full output of lspci -s 00:1f.3 -vv?

00:1f.3 SMBus: Intel Corporation 631xESB/632xESB/3100 Chipset SMBus Controller (rev 09)
Subsystem: IBM Device 02dd
Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin B routed to IRQ 0
Region 4: I/O ports at 0440 [size=32]

> I'm also curious if the SMBus controller shares its interrupt line
> with another chip. /proc/interrupts should tell but you'll have to
> make one of your systems hang again.

I'm not sure how to read it, so here it is (3.9.2, immediately after
boot, no options to i2c_i801):

CPU0 CPU1 CPU2 CPU3
0: 42 0 0 0 IO-APIC-edge timer
1: 0 0 0 0 IO-APIC-edge i8042
4: 1 1 0 0 IO-APIC-edge
8: 0 1 0 0 IO-APIC-edge rtc0
9: 0 0 0 0 IO-APIC-fasteoi acpi
14: 0 0 0 0 IO-APIC-edge ata_piix
15: 0 0 0 0 IO-APIC-edge ata_piix
17: 1225 1124 1113 1111 IO-APIC-fasteoi aacraid
20: 0 0 0 0 IO-APIC-fasteoi i801_smbus
22: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb2, radeon
23: 25 21 27 29 IO-APIC-fasteoi uhci_hcd:usb1, uhci_hcd:usb3, ehci_hcd:usb4
41: 79 8 5 4 PCI-MSI-edge eth2
42: 1 2 1 4 PCI-MSI-edge eth3
43: 0 2 1 1 PCI-MSI-edge ioat-msi
44: 98 107 111 111 PCI-MSI-edge eth1
45: 1178 1210 1218 1215 PCI-MSI-edge eth0
NMI: 4 5 3 4 Non-maskable interrupts
LOC: 3685 3953 6895 8014 Local timer interrupts
SPU: 0 0 0 0 Spurious interrupts
PMI: 4 5 3 4 Performance monitoring interrupts
IWI: 0 0 0 0 IRQ work interrupts
RTR: 0 0 0 0 APIC ICR read retries
RES: 6352 5546 6942 7790 Rescheduling interrupts
CAL: 975 1256 973 1488 Function call interrupts
TLB: 682 964 732 1003 TLB shootdowns
TRM: 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 Threshold APIC interrupts
MCE: 0 0 0 0 Machine check exceptions
MCP: 1 1 1 1 Machine check polls
ERR: 0
MIS: 0

> You can also pass parameter disable_features=0x10 to the i2c-i801
> driver, this will disable interrupt support without having to rebuild
> the driver. I suppose this could be documented in more details in
> modinfo, I'll work on that.

I went with blacklisting for now because this driver doesn't appear to
be doing anything useful for us (sensors etc are working without it).
I'll confess to not really knowing much about its purpose though.

> Thanks for the offer. Right now I am stuck in bed and must take some
> rest. When I feel better I'll see if I can gain access to systems with
> Intel 63xxESB chips to try and reproduce the hang you're seeing. I'll
> also take a look at the datasheets again to see if any difference
> stands out.

We'd be happy to give you access to one of our x3550s if you like (the
same one I did the bisect on). We'd move it outside our production
network and reinstall it and you'd be free to poke and prod and crash it
as much as you like. Let me know when/if you're interested and we'll
make it happen. No hurry from our end though, its a barely-used machine
and will happily sit there waiting. Get your rest first!

> As far as debugging goes, please tell me if you have any I2C/SMBus
> slave device driver loaded (check in /sys/bus/i2c/drivers.) Loading the
> i2c-i801 driver doesn't do much on its own if there are no slave device
> drivers using it.

$ modprobe i2c-i801 disable_features=0x10
$ dmesg | tail
...
[28876.193408] i801_smbus 0000:00:1f.3: Interrupt disabled by user
[28876.201168] ics932s401 4-0069: ics932s401 chip found
$ ls /sys/bus/i2c/drivers
dummy ics932s401

Thanks for your help!

Cheers,
Rob.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/