Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

From: Patrick McLean
Date: Fri Nov 10 2017 - 18:26:36 EST




On 2017-11-10 10:42 AM, Linus Torvalds wrote:
> On Thu, Nov 9, 2017 at 5:58 PM, Patrick McLean <chutzpah@xxxxxxxxxx> wrote:
>>
>> Something must have changed since 4.13.8 to trigger this though.
>
> Arnd pointed to some commits that might be relevant for the cp210x
> module, but those are all already in 4.13.8, so if 4.13.8 really is
> rock solid for you, I don't think that's it.
>
> I really don't see anything that looks even half-way suspicious in
> that 4.13.8..11 range. But as mentioned, compiler interactions can be
> _really_ subtle.
>
> And hey, it can be a real kernel bug too, that just happens to be
> exposed by RANDSTRUCT, so a bisect really would be very nice.

I am working on bisecting the issue now, but I think I have some more
evidence pointing to a compiler issue related to RANDSTRUCT. There are
actually 3 issues that we have seen. Sometimes we get the null pointer
deref in the initial message, sometimes we get the GPF, and sometimes we
see an issue where the NFS clients see all files as root-owned
directories. Any given kernel will always see the same issue, but after
a "make mrproper" and recompile (with the same .config), the issue will
often change. I suspect that all 3 of these problems are actually the
same issue manifesting itself in different ways depending on what seed
the RANDSTRUCT gcc plugin is using.

>
> Because in the end, compiler bugs are very rare. They are particularly
> annoying when they do happen, though, so they loom big in the mind of
> people who have had to chase them down.
>