Re: linux-3.6.11-rt30 smoke test on ARM

From: Frank Rowand
Date: Thu Mar 21 2013 - 16:26:51 EST


On 03/12/13 17:44, Frank Rowand wrote:
> On 03/11/13 10:34, Sebastian Andrzej Siewior wrote:
>> * Frank Rowand | 2013-03-07 20:03:18 [-0800]:
>>
>>> panda boot often fails due to a usb timeout, while sending a command on
>>> behalf of the smsc95xx ethernet driver.
>>>
>>> This patch is a temporary hack to force a retry when the timeout occurs.
>>
>> It looks like you overrun the chip for some reason. Can you reproduce it
>> on mainline? They added a few delayes on register read() it might do the
>> trick.
>
> Yes, I can reproduce it on mainline.
>
> Here is the current state of my debugging:
>
> The problem usually occurs within three boot attempts. But it has also
> taken eight boot attempts to see the problem. I do not know what the
> maximum number of boots is required to see the problem, so I can not
> state with certainty that a given kernel version does not have the
> problem. If the boot fails then I can state with certainty that the
> given kernel version has the problem.
>
> Given that level of uncertainty, I know:
>
> v3.5 does not appear to have the problem
> v3.6-rc1 has the problem
> v3.6 has the problem
> v3.7 has the problem
> v3.8 does not appear to have the problem
> v3.9-rc1 fails to build
>
> I thought I had bisected the problem to a specific commit, but wanting
> to be sure of it, I did extra boots of what should have been the last
> good commit. On the 7th boot, that kernel version had the problem.
>
> I'll probably redo the bisect, but have not had time to do so yet.

I did the bisect again, with more boot tests per bisect point, and found
the commit to blame. Hopefully the problem will be resolved in the
thread where I report the bisect:

https://lkml.org/lkml/2013/3/20/742


>
> The problem manifests as a timeout from at least two different locations
> in drivers/net/usb/smsc95xx.c:
>
>
> 656 static int smsc95xx_set_mac_address(struct usbnet *dev)
> 657 {
> ...
> 663 ret = smsc95xx_write_reg(dev, ADDRL, addr_lo);
> 664 if (ret < 0) {
> 665 netdev_warn(dev->net, "Failed to write ADDRL: %d\n", ret);
> 666 return ret;
> 667 }
>
>
> 751 static int smsc95xx_reset(struct usbnet *dev)
> 752 {
> ...
> 783 write_buf = PM_CTL_PHY_RST_;
> 784 ret = smsc95xx_write_reg(dev, PM_CTRL, write_buf);
> 785 if (ret < 0) {
> 786 netdev_warn(dev->net, "Failed to write PM_CTRL: %d\n", ret);
> 787 return ret;
> 788 }
>
>
> Some of the other smsc95xx_write_reg() calls in smsc95xx_reset() are protected with
> checks for timeout, with up to 100 retries. I do not know if this one should have
> the same protection.
>
> -Frank


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/