Re: Soft lockup issue in Linux 4.1.9

From: Holger Hoffstätte
Date: Thu Oct 01 2015 - 06:52:14 EST



On Thu, 01 Oct 2015 06:41:46 +0200, Andre Tomt wrote:

> On 01. okt. 2015 00:37, Holger HoffstÃtte wrote:
>> On Wed, 30 Sep 2015 23:59:43 +0200, Olivier Bonvalet wrote:
>>
>>> for information, I've just upgraded 6 servers from Linux 4.1.8 to Linux
>>> 4.1.9, and have some random soft lockup. If this can help :
>>
>> Congratulations! You're not the first one to get hit by this, but
>> you are probably the first one to get a meaningful stacktrace! \o/
>>
>>> [ 204.478380] Call Trace:
>>> [ 204.478381] <IRQ>
>>> [ 204.478385] [<ffffffff81076121>] ? try_to_del_timer_sync+0x43/0x4d
>>> [ 204.478386] [<ffffffff810760de>] ? del_timer+0x4d/0x4d
>>> [ 204.478388] [<ffffffff8107614b>] ? del_timer_sync+0x20/0x3d
>>
>> Can you try to revert
>>
>> [PATCH 4.1 157/159] inet: fix races with reqsk timers
>>
>> and see how that works for you? I'll do the same on my end. So far the
>> only thing I ever could gleam was an rcu stall after cpuidle_enter(),
>> but never anything regarding the timer - though it was definitely
>> related to NIC activity after idle.
>
> I'm running with this patch reverted now as well. 2 hours no issues so
> far, but I can't conclude anything yet as I've seen it take up to 6+
> hours to explode here. As a result the bisect was going veeery slowly.

Now 12+ hours going without problems, never got this far with the patch
included, as it would usually freeze during idle periods.

As far as I'm concerned this is the culprit and should be reverted in
4.1.x, unless Eric can suggest how to fix this. (cc'ed).

cheers
Holger

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/