Re: frequent lockups in 3.18rc4

From: Dave Jones
Date: Thu Dec 18 2014 - 22:59:44 EST


On Thu, Dec 18, 2014 at 07:49:41PM -0800, Linus Torvalds wrote:

> And when spinlocks start getting contention, *nested* spinlocks
> really really hurt. And you've got all the spinlock debugging on etc,
> don't you?

Yeah, though remember this seems to have for some reason gotten worse
in more recent builds. I've been running kitchen-sink debug kernels
for my trinity runs for the last three years, and it's only this
last few months that this has got to be enough of a problem that I'm
not seeing the more interesting bugs. (Or perhaps we're just getting
better at fixing them in -next now, so my runs are lasting longer..)

> Also, you do have this:
>
> sched: RT throttling activated
>
> so there's something going on with RT scheduling too.

I see that fairly often. I've never dug into exactly what causes it, but
it seems to be triggerable just by some long running CPU hogs.

> So your printouts are finally starting to make sense. But I'm also
> starting to suspect strongly that the problem is that with all your
> lock debugging and other overheads (does this still have
> DEBUG_PAGEALLOC?) you really are getting into a "real" softlockup
> because things are scaling so horribly badly.
>
> If you now disable spinlock debugging and lockdep, hopefully that page
> table lock now doesn't always get hung up on the lockdep locking, so
> it starts scaling much better, and maybe you'd not see this...

I can give it a shot. Hopefully there's some further mitigation that
could be done to allow a workload like this to survive under a debug
build though, as we've caught *so many* bugs with this stuff in the past.

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/