Re: [PATCH 026/270] powerpc/eeh: Lock module while handling EEHevent

From: Ben Hutchings
Date: Mon Nov 26 2012 - 21:18:39 EST


On Mon, 2012-11-26 at 14:55 -0200, Herton Ronaldo Krzesinski wrote:
> 3.5.7u1 -stable review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Gavin Shan <shangw@xxxxxxxxxxxxxxxxxx>
>
> commit feadf7c0a1a7c08c74bebb4a13b755f8c40e3bbc upstream.
>
> The EEH core is talking with the PCI device driver to determine the
> action (purely reset, or PCI device removal). During the period, the
> driver might be unloaded and in turn causes kernel crash as follows:
>
> EEH: Detected PCI bus error on PHB#4-PE#10000
> EEH: This PCI device has failed 3 times in the last hour
> lpfc 0004:01:00.0: 0:2710 PCI channel disable preparing for reset
> Unable to handle kernel paging request for data at address 0x00000490
> Faulting instruction address: 0xd00000000e682c90
> cpu 0x1: Vector: 300 (Data Access) at [c000000fc75ffa20]
> pc: d00000000e682c90: .lpfc_io_error_detected+0x30/0x240 [lpfc]
> lr: d00000000e682c8c: .lpfc_io_error_detected+0x2c/0x240 [lpfc]
> sp: c000000fc75ffca0
> msr: 8000000000009032
> dar: 490
> dsisr: 40000000
> current = 0xc000000fc79b88b0
> paca = 0xc00000000edb0380 softe: 0 irq_happened: 0x00
> pid = 3386, comm = eehd
> enter ? for help
> [c000000fc75ffca0] c000000fc75ffd30 (unreliable)
> [c000000fc75ffd30] c00000000004fd3c .eeh_report_error+0x7c/0xf0
> [c000000fc75ffdc0] c00000000004ee00 .eeh_pe_dev_traverse+0xa0/0x180
> [c000000fc75ffe70] c00000000004ffd8 .eeh_handle_event+0x68/0x300
> [c000000fc75fff00] c0000000000503a0 .eeh_event_handler+0x130/0x1a0
> [c000000fc75fff90] c000000000020138 .kernel_thread+0x54/0x70
> 1:mon>
>
> The patch increases the reference of the corresponding driver modules
> while EEH core does the negotiation with PCI device driver so that the
> corresponding driver modules can't be unloaded during the period and
> we're safe to refer the callbacks.
>
> Reported-by: Alexey Kardashevskiy <aik@xxxxxxxxx>
> Signed-off-by: Gavin Shan <shangw@xxxxxxxxxxxxxxxxxx>
> Signed-off-by: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>
> [ herton: backported for 3.5, adjusted driver assignments, return 0
> instead of NULL, assume dev is not NULL ]
> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@xxxxxxxxxxxxx>
[...]

Greg, you probably want this in 3.4 and 3.6.

Ben.

--
Ben Hutchings
Never attribute to conspiracy what can adequately be explained by stupidity.

Attachment: signature.asc
Description: This is a digitally signed message part