Re: [ANNOUNCE] 3.0.14-rt31 - ksoftirq running wild - FEC ethernet driver to blame? Yep

From: Tim Sander
Date: Wed Feb 01 2012 - 18:18:03 EST


Hi Steven
> Is the system still usable when this happens? If so, can you configure
> in ftrace, and run a trace on what ksoftirq is doing:
Well its slooooooooooooooooow since its only 5% of a 500Mhz arm v6 cpu.
So i can easy type faster than this thing echo characters on a serial console :-)
> mkdir /debug
> mount -t debugfs nodev /debug
> cd /debug/tracing
> echo <pid-of-ksoftirq> > set_ftrace_pid
> echo function > current_tracer
> cat trace
Well i tried the complete function tracer and i think systemload is just to high
for this system but i will give it a try as soon as i see this error again.

When toying around with the hw debugger i think it runs somehow into do_coredump
when this error hits and then somehow loops but since i was feeding the wrong
symbol table to my hw debugger all this stuff looked even weirder today 8-/.

I was also toying around with setting the phy timeout in the driver and
hacking in the phy interrupt, but nothing conclusive.

Best regards
Tim

dmesg output with phy irq enabled, either my hackish interrupt setting is not
working or the fec driver has a problem with phy interrupts... dunno:

nf_conntrack version 0.5.0 (1979 buckets, 7916 max)
fec_stop : Graceful transmit stop did not complete !
sched: RT throttling activated
FEC: MDIO read timeout
PHY: 1:00 - Link is Down
irq 103: nobody cared (try booting with the "irqpoll" option)
Backtrace:
[<c002de30>] (dump_backtrace+0x0/0x110) from [<c024d780>] (dump_stack+0x18/0x1c)
r6:00000000 r5:c794a2e0 r4:c031856c r3:00000000
[<c024d768>] (dump_stack+0x0/0x1c) from [<c0070930>] (__report_bad_irq.clone.5+0x2c/0xdc)
[<c0070904>] (__report_bad_irq.clone.5+0x0/0xdc) from [<c0070bf0>] (note_interrupt+0x19c/0x244)
r5:c794a2e0 r4:c0318544
[<c0070a54>] (note_interrupt+0x0/0x244) from [<c006f724>] (irq_thread+0xf0/0x1f4)
[<c006f634>] (irq_thread+0x0/0x1f4) from [<c0057298>] (kthread+0x8c/0x94)
[<c005720c>] (kthread+0x0/0x94) from [<c00413d4>] (do_exit+0x0/0x2d8)
r7:00000013 r6:c00413d4 r5:c005720c r4:c7bd9904
handlers:
[<c006f4c8>] irq_default_primary_handler threaded [<c01a1660>] phy_interrupt
Disabling IRQ #103
FEC: MDIO write timeout
init: avahi-autoip main process (423) terminated with status 1
init: avahi-autoip main process ended, respawning
eth0: Freescale FEC PHY driver [Micrel KS8041] (mii_bus:phy_addr=1:00, irq=103)
ADDRCONF(NETDEV_UP): eth0: link is not ready
PHY: 1:00 - Link is Up - 100/Full
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/