Re: [ANNOUNCE] 3.0.14-rt31 - ksoftirq running wild - FEC ethernet driver to blame?

From: Tim Sander
Date: Tue Jan 17 2012 - 09:38:06 EST


Hi

I have further input to the ksoftirq/0 using as much cpu as available on a
arm i.mx pcm043 platform without load with a 3.0.14-rt31 kernel and some local
platform adaptions.

> I was thinking about this ksoftirq0 running on max cpu. The context:
> > > > and the running wild ksoftirqd0 most probably after the kernel
> > > > message: "sched: RT throttling activated"
I think that the message "sched: RT throttling activated" and the ksoftirqd
running on full cpu are possibly to seperate errors. Btw. is there a way to
find out which processes where conuming this timeframe. It would be nice to add
info output which processes caused the throtteling. Is it possible to get this
information out of the scheduler structure?

> > > Hmm, that's not good. It means that an RT task is spinning too much.
> >
> > Mh, sorry i was to terse on that. This only happens after first boot on
> > UBIFS update, but it shows that somehow there seems to be a corner case
> > when throtteling is activated. Since this seems to be the reason for
> > ksoftirq0 running as much cpu as it gets. I just patched out switch to rt
> > throtteling and i will ask the mtd guys about the work they presumably do
> > in interrupt context which causes this throtteling in the first place.
Ok, by taking a detour to the UBI stuff i an now convinced that the UBI
subsystem seems not to be the culprit, of this throtteling message. Its just
that right after UBI has been attached the network setting is restored after a
firmware update so it seems that this triggers the error.

I have been toying around with connman (connman.net) which manages network
connections on embedded devices and stopping this daemon which seems to do a
"ifconfig eth0 down" when terminated. After doing a manual "ifconfig eth0 up"
ksoftirq/0 uses as much cpu as it gets. Just doing a "ifconfig eth0 down &&
ifconfig eth0 up" does not create 100% cpu load, so somehow connman triggers
this error pretty reliably.

Also the dmesg output seems to point to the FEC ethernet driver as culprit for
this throtteling message:
eth0: Freescale FEC PHY driver [Micrel KS8041] (mii_bus:phy_addr=1:00, irq=-1)
sched: RT throttling activated
PHY: 1:00 - Link is Up - 100/Full

Other output in dmesg right before the ksoftirq is running wild:
eth0: no IPv6 routers present
ADDRCONF(NETDEV_UP): eth0: link is not ready
eth0: Freescale FEC PHY driver [Micrel KS8041] (mii_bus:phy_addr=1:00, irq=-1)
eth0: no IPv6 routers present
PHY: 1:00 - Link is Up - 100/Full
eth0: Freescale FEC PHY driver [Micrel KS8041] (mii_bus:phy_addr=1:00, irq=-1)
FEC: MDIO read timeout
PHY: 1:00 - Link is Up - 100/Full
eth0: no IPv6 routers present
ADDRCONF(NETDEV_UP): eth0: link is not ready
eth0: Freescale FEC PHY driver [Micrel KS8041] (mii_bus:phy_addr=1:00, irq=-1)
FEC: MDIO read timeout
PHY: 1:00 - Link is Up - 100/Full

It seems as if the polling of the phy might be a interfering with the above
problem so i'd like to test if the problems go away with a phy irq defined:. I
have the interrupt line of the phy connected but somehow i got stuck in all
these layers how to set
phy_dev->irq = gpio_to_irq(IMX_GPIO_NR(2,7)
in the board definition mach-pcm043.c. So there must be an example how to define
the phy irq?

Best regards
Tim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/