Re: Heads-up: 3.6.2 / 3.6.3 NFS server panic: 3.6.2+ regression?

From: J. Bruce Fields
Date: Tue Oct 23 2012 - 10:30:14 EST


On Tue, Oct 23, 2012 at 03:07:59PM +0100, Nix wrote:
> On 23 Oct 2012, J. Bruce Fields uttered the following:
>
> > On Mon, Oct 22, 2012 at 05:17:04PM +0100, Nix wrote:
> >> I just had a panic/oops on upgrading from 3.6.1 to 3.6.3, after weeks of
> >> smooth operation on 3.6.1: one of the NFS changes that went into one of
> >> the two latest stable kernels appears to be lethal after around half an
> >> hour of uptime. The oops came from NFSv4, IIRC (relying on memory since
> >> my camera was recharging and there is no netconsole from that box
> >> because it is where the netconsole logs go, so I'll have to reproduce it
> >> later today). The machine is an NFSv3 server only at present, with no
> >> NFSv4 running (though NFSv4 is built in).
> >
> > Note recent clients may try to negotiate NFSv4 by default, so it's
> > possible to use it without knowing.
>
> Every NFS import from all this server's clients has 'vers=3' forced on
> (for now, until I get around to figuring out what if anything needs to
> be done to move to NFSv4: it may be the answer is 'nothing' but I tread
> quite carefully with this server, since my $HOME is there).
>
> /proc/fs/nfsfs/volumes on the clients confirms that everything is v3.
>
> > You didn't change anything else about your server or clients recently?
>
> Nope (other than upgrading the clients to 3.6.3 in concert). Running
> 3.6.1 here on that server now and 3.6.3 on all the clients, and no crash
> in over a day.
>
> nfs-utils is a bit old, 1.2.6-rc6, I should probably upgrade it...

nfs-utils shouldn't be capable of oopsing the kernel, so from my
(selfish) point of view I'd actually rather you stick with whatever you
have and try to reproduce the oops.

(It's unlikely to make any difference anyway.)

--b.

> > I don't see an obvious candidate on a quick skim of v3.6.1..v3.6.3
> > commits, but of course I could be missing something.
>
> I'll try rebooting into it again soon, and get an oops report if it
> should happen again (and hopefully less filesystem corruption this time,
> right after a reboot into a new kernel was clearly the wrong time to
> move a bunch of directories around).
>
> Sorry for the continuing lack of useful information... I'll try to fix
> that shortly (once with a report of a false alarm, since I really want
> this stable kernel: my server is affected by the tx stalls fixed by
> 3468e9d and I'm getting tired of the frequent 1s lags talking to it: I
> could just apply that one fix, but I'd rather track this down properly).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/