Re: Heads-up: 3.6.2 / 3.6.3 NFS server panic: 3.6.2+ regression?

From: Nix
Date: Tue Oct 23 2012 - 10:08:00 EST


On 23 Oct 2012, J. Bruce Fields uttered the following:

> On Mon, Oct 22, 2012 at 05:17:04PM +0100, Nix wrote:
>> I just had a panic/oops on upgrading from 3.6.1 to 3.6.3, after weeks of
>> smooth operation on 3.6.1: one of the NFS changes that went into one of
>> the two latest stable kernels appears to be lethal after around half an
>> hour of uptime. The oops came from NFSv4, IIRC (relying on memory since
>> my camera was recharging and there is no netconsole from that box
>> because it is where the netconsole logs go, so I'll have to reproduce it
>> later today). The machine is an NFSv3 server only at present, with no
>> NFSv4 running (though NFSv4 is built in).
>
> Note recent clients may try to negotiate NFSv4 by default, so it's
> possible to use it without knowing.

Every NFS import from all this server's clients has 'vers=3' forced on
(for now, until I get around to figuring out what if anything needs to
be done to move to NFSv4: it may be the answer is 'nothing' but I tread
quite carefully with this server, since my $HOME is there).

/proc/fs/nfsfs/volumes on the clients confirms that everything is v3.

> You didn't change anything else about your server or clients recently?

Nope (other than upgrading the clients to 3.6.3 in concert). Running
3.6.1 here on that server now and 3.6.3 on all the clients, and no crash
in over a day.

nfs-utils is a bit old, 1.2.6-rc6, I should probably upgrade it...

> I don't see an obvious candidate on a quick skim of v3.6.1..v3.6.3
> commits, but of course I could be missing something.

I'll try rebooting into it again soon, and get an oops report if it
should happen again (and hopefully less filesystem corruption this time,
right after a reboot into a new kernel was clearly the wrong time to
move a bunch of directories around).

Sorry for the continuing lack of useful information... I'll try to fix
that shortly (once with a report of a false alarm, since I really want
this stable kernel: my server is affected by the tx stalls fixed by
3468e9d and I'm getting tired of the frequent 1s lags talking to it: I
could just apply that one fix, but I'd rather track this down properly).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/