Re: 2.6.19-rc5-mm2 (Oops in class_device_remove_attrs during nodemgr_remove_host)

From: Stefan Richter
Date: Sat Nov 18 2006 - 16:45:52 EST


I added the following patch:

--- linux-2.6.19-rc5-mm2.orig/drivers/ieee1394/nodemgr.c 2006-11-18 21:18:05.000000000 +0100
+++ linux-2.6.19-rc5-mm2/drivers/ieee1394/nodemgr.c 2006-11-18 21:33:44.000000000 +0100
@@ -798,8 +798,9 @@ static void nodemgr_remove_uds(struct no

static void nodemgr_remove_ne(struct node_entry *ne)
{
- struct device *dev = &ne->device;
+ struct device *dev;

+ HPSB_DEBUG("****** nodemgr_remove_ne");
dev = get_device(&ne->device);
if (!dev)
return;
@@ -817,7 +818,10 @@ static void nodemgr_remove_ne(struct nod

static int __nodemgr_remove_host_dev(struct device *dev, void *data)
{
- nodemgr_remove_ne(container_of(dev, struct node_entry, device));
+ struct node_entry *ne = container_of(dev, struct node_entry, device);
+
+ HPSB_DEBUG("****** ne = %p", ne);
+ nodemgr_remove_ne(ne);
return 0;
}

@@ -906,6 +910,7 @@ static struct node_entry *nodemgr_create
HPSB_DEBUG("%s added: ID:BUS[" NODE_BUS_FMT "] GUID[%016Lx]",
(host->node_id == nodeid) ? "Host" : "Node",
NODE_BUS_ARGS(host, nodeid), (unsigned long long)guid);
+ HPSB_DEBUG("****** ne = %p", ne);

return ne;



With this I get the following kernel log on a PC with two FireWire cards:

# modprobe ohci1394

Nov 18 21:38:05 shuttle kernel: ieee1394: Initialized config rom entry `ip1394'
Nov 18 21:38:05 shuttle kernel: ohci1394: fw-host0: OHCI-1394 1.1 (PCI): IRQ=[17] MMIO=[e7004000-e70047ff] Max Packet=[4096] IR/IT contexts=[4/8]
Nov 18 21:38:05 shuttle kernel: ohci1394: fw-host1: OHCI-1394 1.0 (PCI): IRQ=[19] MMIO=[e7006000-e70067ff] Max Packet=[2048] IR/IT contexts=[8/8]
Nov 18 21:38:07 shuttle kernel: ieee1394: Error parsing configrom for node 0-01:1023
Nov 18 21:38:07 shuttle kernel: ieee1394: Host added: ID:BUS[0-02:1023] GUID[0001080000002d02]
Nov 18 21:38:07 shuttle kernel: ieee1394: ****** ne = f61f609c
Nov 18 21:38:07 shuttle kernel: eth1394: eth1: IEEE-1394 IPv4 over 1394 Ethernet (fw-host0)
Nov 18 21:38:07 shuttle kernel: eth1394: eth2: IEEE-1394 IPv4 over 1394 Ethernet (fw-host1)
Nov 18 21:38:07 shuttle kernel: ieee1394: Host added: ID:BUS[1-00:1023] GUID[00301bac00002ba4]
Nov 18 21:38:07 shuttle kernel: ieee1394: ****** ne = f50db964

# modprobe -r ohci1394

Nov 18 21:38:18 shuttle kernel: ieee1394: ****** ne = f627952c
Nov 18 21:38:18 shuttle kernel: ieee1394: ****** nodemgr_remove_ne
Nov 18 21:38:18 shuttle kernel: BUG: unable to handle kernel NULL pointer dereference at virtual address 000000b8

We never created a "ne" at f627952c. This address is obtained by the
container_of() in __nodemgr_remove_host_dev(). Ergo the list of device
pointers which device_for_each_child() in nodemgr_remove_host_dev() is
iterating over, i.e. the host->device.klist_children, was corrupted
somewhere.

Nov 18 21:38:18 shuttle kernel: printing eip:
Nov 18 21:38:18 shuttle kernel: f9057b4c
Nov 18 21:38:18 shuttle kernel: *pde = 00000000
Nov 18 21:38:18 shuttle kernel: Oops: 0000 [#1]
Nov 18 21:38:18 shuttle kernel: PREEMPT SMP
Nov 18 21:38:18 shuttle kernel: last sysfs file: /class/printer/lp0/dev
Nov 18 21:38:18 shuttle kernel: Modules linked in: eth1394 ohci1394 ieee1394 nvidia(P) nfsd exportfs lockd sunrpc snd_via82xx snd_ac97_codec snd_ac97_bus snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd lp af_packet 8139too mii loop via_agp agpgart uhci_hcd
Nov 18 21:38:18 shuttle kernel: CPU: 0
Nov 18 21:38:18 shuttle kernel: EIP: 0060:[pg0+950528844/1067602944] Tainted: P VLI
Nov 18 21:38:18 shuttle kernel: EIP: 0060:[<f9057b4c>] Tainted: P VLI
Nov 18 21:38:18 shuttle kernel: EFLAGS: 00210213 (2.6.19-rc5-mm2 #15)
Nov 18 21:38:18 shuttle kernel: EIP is at nodemgr_remove_ne+0x4c/0x90 [ieee1394]
Nov 18 21:38:18 shuttle kernel: eax: 00000000 ebx: f6279568 ecx: 00003a2d edx: 0000037a
Nov 18 21:38:18 shuttle kernel: esi: f627952c edi: f9057b90 ebp: f57a3dd8 esp: f57a3db8
Nov 18 21:38:18 shuttle kernel: ds: 007b es: 007b ss: 0068
Nov 18 21:38:18 shuttle kernel: Process modprobe (pid: 5967, ti=f57a2000 task=f576e0f0 task.ti=f57a2000)
Nov 18 21:38:18 shuttle kernel: Stack: f6279568 f627952c 00000020 0000037a f8c7de00 00000000 f627952c f57a3dfc
Nov 18 21:38:18 shuttle kernel: f57a3dec f9057bb5 f627952c f627952c 00000000 f57a3e18 c02313b2 f6279568
Nov 18 21:38:18 shuttle kernel: 00000000 f73ea0c4 f73ea0e4 f6279598 c02d8b18 f73ea0c4 f73ea000 f73ea000
Nov 18 21:38:18 shuttle kernel: Call Trace:
Nov 18 21:38:19 shuttle kernel: [show_trace_log_lvl+47/80] show_trace_log_lvl+0x2f/0x50
Nov 18 21:38:19 shuttle kernel: [<c010400f>] show_trace_log_lvl+0x2f/0x50
Nov 18 21:38:19 shuttle kernel: [show_stack_log_lvl+151/192] show_stack_log_lvl+0x97/0xc0
Nov 18 21:38:19 shuttle kernel: [<c01040f7>] show_stack_log_lvl+0x97/0xc0
Nov 18 21:38:19 shuttle kernel: [show_registers+453/832] show_registers+0x1c5/0x340
Nov 18 21:38:19 shuttle kernel: [<c0104355>] show_registers+0x1c5/0x340
Nov 18 21:38:19 shuttle kernel: [die+298/544] die+0x12a/0x220
Nov 18 21:38:19 shuttle kernel: [<c010469a>] die+0x12a/0x220
Nov 18 21:38:19 shuttle kernel: [do_page_fault+873/1632] do_page_fault+0x369/0x660
Nov 18 21:38:19 shuttle kernel: [<c0114649>] do_page_fault+0x369/0x660
Nov 18 21:38:19 shuttle kernel: [error_code+124/132] error_code+0x7c/0x84
Nov 18 21:38:19 shuttle kernel: [<c02dadfc>] error_code+0x7c/0x84
Nov 18 21:38:19 shuttle kernel: [pg0+950528949/1067602944] __nodemgr_remove_host_dev+0x25/0x30 [ieee1394]
Nov 18 21:38:19 shuttle kernel: [<f9057bb5>] __nodemgr_remove_host_dev+0x25/0x30 [ieee1394]
Nov 18 21:38:19 shuttle kernel: [device_for_each_child+50/96] device_for_each_child+0x32/0x60
Nov 18 21:38:19 shuttle kernel: [<c02313b2>] device_for_each_child+0x32/0x60
Nov 18 21:38:19 shuttle kernel: [pg0+950528994/1067602944] nodemgr_remove_host_dev+0x22/0x90 [ieee1394]
Nov 18 21:38:19 shuttle kernel: [<f9057be2>] nodemgr_remove_host_dev+0x22/0x90 [ieee1394]
Nov 18 21:38:19 shuttle kernel: [pg0+950536711/1067602944] nodemgr_remove_host+0x37/0x40 [ieee1394]
Nov 18 21:38:19 shuttle kernel: [<f9059a07>] nodemgr_remove_host+0x37/0x40 [ieee1394]
Nov 18 21:38:19 shuttle kernel: [pg0+950514108/1067602944] __unregister_host+0x8c/0xd0 [ieee1394]
Nov 18 21:38:19 shuttle kernel: [<f90541bc>] __unregister_host+0x8c/0xd0 [ieee1394]
Nov 18 21:38:19 shuttle kernel: [pg0+950516502/1067602944] highlevel_remove_host+0x36/0x60 [ieee1394]
Nov 18 21:38:19 shuttle kernel: [<f9054b16>] highlevel_remove_host+0x36/0x60 [ieee1394]
Nov 18 21:38:19 shuttle kernel: [pg0+950512675/1067602944] hpsb_remove_host+0x43/0x70 [ieee1394]
Nov 18 21:38:19 shuttle kernel: [<f9053c23>] hpsb_remove_host+0x43/0x70 [ieee1394]
Nov 18 21:38:19 shuttle kernel: [pg0+946040328/1067602944] ohci1394_pci_remove+0x68/0x240 [ohci1394]
Nov 18 21:38:19 shuttle kernel: [<f8c0fe08>] ohci1394_pci_remove+0x68/0x240 [ohci1394]
Nov 18 21:38:19 shuttle kernel: [pci_device_remove+70/80] pci_device_remove+0x46/0x50
Nov 18 21:38:19 shuttle kernel: [<c01fe976>] pci_device_remove+0x46/0x50
Nov 18 21:38:19 shuttle kernel: [__device_release_driver+174/192] __device_release_driver+0xae/0xc0
Nov 18 21:38:19 shuttle kernel: [<c023377e>] __device_release_driver+0xae/0xc0
Nov 18 21:38:19 shuttle kernel: [driver_detach+280/288] driver_detach+0x118/0x120
Nov 18 21:38:19 shuttle kernel: [<c0233908>] driver_detach+0x118/0x120
Nov 18 21:38:19 shuttle kernel: [bus_remove_driver+68/112] bus_remove_driver+0x44/0x70
Nov 18 21:38:19 shuttle kernel: [<c0232c44>] bus_remove_driver+0x44/0x70
Nov 18 21:38:19 shuttle kernel: [driver_unregister+18/32] driver_unregister+0x12/0x20
Nov 18 21:38:19 shuttle kernel: [<c0233bd2>] driver_unregister+0x12/0x20
Nov 18 21:38:19 shuttle kernel: [pci_unregister_driver+21/48] pci_unregister_driver+0x15/0x30
Nov 18 21:38:19 shuttle kernel: [<c01fecf5>] pci_unregister_driver+0x15/0x30
Nov 18 21:38:19 shuttle kernel: [pg0+946042066/1067602944] ohci1394_cleanup+0x12/0x14 [ohci1394]
Nov 18 21:38:19 shuttle kernel: [<f8c104d2>] ohci1394_cleanup+0x12/0x14 [ohci1394]
Nov 18 21:38:19 shuttle kernel: [sys_delete_module+342/384] sys_delete_module+0x156/0x180
Nov 18 21:38:19 shuttle kernel: [<c0142aa6>] sys_delete_module+0x156/0x180
Nov 18 21:38:19 shuttle kernel: [sysenter_past_esp+95/133] sysenter_past_esp+0x5f/0x85
Nov 18 21:38:19 shuttle kernel: [<c01031f6>] sysenter_past_esp+0x5f/0x85
Nov 18 21:38:19 shuttle kernel: =======================
Nov 18 21:38:19 shuttle kernel: Code: c7 85 c0 89 c3 74 60 8b 06 8b 56 04 89 44 24 10 89 54 24 14 0f b7 46 14 89 c2 83 e0 3f c1 ea 06 89 44 24 08 89 54 24 0c 8b 46 10 <8b> 80 b8 00 00 00 c7 04 24 2c b5 07 f9 89 44 24 04 e8 3e 6c 0c
Nov 18 21:38:19 shuttle kernel: EIP: [pg0+950528844/1067602944] nodemgr_remove_ne+0x4c/0x90 [ieee1394] SS:ESP 0068:f57a3db8
Nov 18 21:38:19 shuttle kernel: EIP: [<f9057b4c>] nodemgr_remove_ne+0x4c/0x90 [ieee1394] SS:ESP 0068:f57a3db8



The same on Linux 2.6.19-rc4 plus what constitutes git-ieee1394.patch
plus the diagnostics patch:

# modprobe ohci1394

Nov 18 22:09:25 shuttle kernel: ieee1394: Initialized config rom entry `ip1394'
Nov 18 22:09:25 shuttle kernel: ohci1394: fw-host0: OHCI-1394 1.1 (PCI): IRQ=[17] MMIO=[e7004000-e70047ff] Max Packet=[4096] IR/IT contexts=[4/8]
Nov 18 22:09:25 shuttle kernel: ohci1394: fw-host1: OHCI-1394 1.0 (PCI): IRQ=[19] MMIO=[e7006000-e70067ff] Max Packet=[2048] IR/IT contexts=[8/8]
Nov 18 22:09:27 shuttle kernel: ieee1394: Error parsing configrom for node 0-01:1023
Nov 18 22:09:27 shuttle kernel: ieee1394: Host added: ID:BUS[0-02:1023] GUID[0001080000002d02]
Nov 18 22:09:27 shuttle kernel: ieee1394: ****** ne = f505512c
Nov 18 22:09:27 shuttle kernel: eth1394: eth1: IEEE-1394 IPv4 over 1394 Ethernet (fw-host0)
Nov 18 22:09:27 shuttle kernel: eth1394: eth2: IEEE-1394 IPv4 over 1394 Ethernet (fw-host1)
Nov 18 22:09:27 shuttle kernel: ieee1394: Host added: ID:BUS[1-00:1023] GUID[00301bac00002ba4]
Nov 18 22:09:27 shuttle kernel: ieee1394: ****** ne = f5140d80

# modprobe -r ohci1394

Nov 18 22:09:31 shuttle kernel: ieee1394: ****** ne = f5140d80
Nov 18 22:09:31 shuttle kernel: ieee1394: ****** nodemgr_remove_ne
Nov 18 22:09:31 shuttle kernel: ieee1394: Node removed: ID:BUS[1-00:1023] GUID[00301bac00002ba4]
Nov 18 22:09:32 shuttle kernel: ieee1394: ****** ne = f505512c
Nov 18 22:09:32 shuttle kernel: ieee1394: ****** nodemgr_remove_ne
Nov 18 22:09:32 shuttle kernel: ieee1394: Node removed: ID:BUS[0-02:1023] GUID[0001080000002d02]

It seems like one of the patches in -mm overwrites a device's list of
children with junk.

Mattia, *if* your machine is able to compile and reboot into new
kernels really quickly, it would be nice if you could biject between
the -mm patches. I suppose the following ones are those to concentrate
on first:

broken-out/gregkh-driver-config_sysfs_deprecated-bus.patch
broken-out/gregkh-driver-config_sysfs_deprecated-class.patch
broken-out/gregkh-driver-config_sysfs_deprecated-device.patch
broken-out/gregkh-driver-config_sysfs_deprecated-PHYSDEV.patch
broken-out/gregkh-driver-driver-link-sysfs-timing.patch
broken-out/gregkh-driver-sysfs-crash-debugging.patch
broken-out/gregkh-driver-udev-compatible-hack.patch

But hold on, I will do one other thing after I sent this message; I'll
test -mm with CONFIG_SYSFS_DEPRECATED=y.
--
Stefan Richter
-=====-=-==- =-== =--=-
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/