Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

From: Linus Torvalds
Date: Fri Nov 10 2017 - 13:42:58 EST


On Thu, Nov 9, 2017 at 5:58 PM, Patrick McLean <chutzpah@xxxxxxxxxx> wrote:
>
> Something must have changed since 4.13.8 to trigger this though.

Well, yes and no.

Obviously something changed, but it doesn't necessarily have to be
anything particular.

Almost every time we've seen compiler bugs, it's been an innocuous
change that just happened to trigger a latent issue. Pretty much by
definition compiler bugs tend to be about rare situations, so it's
some odd special case that triggers.

Since it's apparently fairly repeatable for you, a bisection between
4.13.8 and 4.13.11 would be very interesting, and shouldn't take all
that long. There's only 142 commits in that range, so even just a
partial bisection of say four of five rounds should narrow it down to
just a couple of commits. And even a full bisection should only take
something like 8 build/test cycles.

Arnd pointed to some commits that might be relevant for the cp210x
module, but those are all already in 4.13.8, so if 4.13.8 really is
rock solid for you, I don't think that's it.

I really don't see anything that looks even half-way suspicious in
that 4.13.8..11 range. But as mentioned, compiler interactions can be
_really_ subtle.

And hey, it can be a real kernel bug too, that just happens to be
exposed by RANDSTRUCT, so a bisect really would be very nice.

Because in the end, compiler bugs are very rare. They are particularly
annoying when they do happen, though, so they loom big in the mind of
people who have had to chase them down.

Linus