Re: Widespread crashes in next-20180906

From: Guenter Roeck
Date: Thu Sep 06 2018 - 11:41:50 EST


On Thu, Sep 06, 2018 at 10:04:13AM -0400, Theodore Y. Ts'o wrote:
> On Thu, Sep 06, 2018 at 06:45:15AM -0700, Guenter Roeck wrote:
> > Build results:
> > total: 134 pass: 133 fail: 1
> > Failed builds:
> > sparc32:allmodconfig
> > Qemu test results:
> > total: 311 pass: 76 fail: 235
> > Failed builds:
> > <pretty much everything trying to boot from disk>
> >
> > Error message is always something like
> >
> > Filesystem requires source device
> > VFS: Cannot open root device "hda" or unknown-block(3,0): error -2
> >
> > The only variance is the boot device. Logs in full glory are available
> > at https://kerneltests.org/builders/, in the "next" column.
> >
> > I did not run bisect, but the recent filesystem changes are a definite suspect.
>
> Yes, this is the vm_fault_t changes. See the other thread on LKML.
> The guilty commit was: 83c0adddcc6e: fs: convert return type int to
> vm_fault_t
>
That thing is just asking for trouble. Why not leave return type
and value alone and add vm_fault_t * (assuming it really adds value)
as another parameter ? Is it really a good idea to deviate from "return
well defined error as integer" as used everywhere else in the kernel ?
Do we really need "my_favored_error_return_t" in every subsystem going
forward ? Oh well, I guess (hope) that is all discussed in the other
thread.

> This is the *second* time vm_fault_t patches have broken things. The
> first time it went through the ext4 tree, and I NACK'ed it after
> running a 60 second smoke test showed it was broken. The seocnd time
> the problem was supposedly fixed, but it went through the mm tree, and
> so I didn't have a chance regression test or stop it...
>
Looking at the patch, NACK seems like the proper response to me, maybe
augmented with "please refrain from shooting yourself (and everyone else)
in the foot".

Guenter