Re: Nfsd/gfs crashes/oops in 2.6.16-rc5

From: Bas van der Vlies
Date: Thu Mar 09 2006 - 02:41:14 EST


Bas van der Vlies wrote:
uname: 2.6.16-rc5
libc: libc-2.3.2.so
Debian: Sarge
SMP system: 2 CPU's

On our 4 node GFS-cluster we use nfs to export the GFS filesystems to our 640 node cluster On our fileserver nodes we get an nfs crash/oops. We tried serveral kernels and they crashes/oops are the same. We node
installed 2.6.16-rc5 and here is the oops:

nable to handle kernel NULL pointer dereference at virtual address 00000038
printing eip:
f89a4be3
*pde = 37809001
*pte = 00000000
Oops: 0000 [#1]
SMP
Modules linked in: lock_dlm dlm cman dm_round_robin dm_multipath sg ide_floppy ide_cd cdrom qla2xxx siimage piix e1000 gfs lock_harness dm_mod
CPU: 0
EIP: 0060:[<f89a4be3>] Tainted: GF VLI
EFLAGS: 00010246 (2.6.16-rc5-sara3 #1)
EIP is at gfs_create+0x6f/0x153 [gfs]


Is is an GFS-crash and just for the record the GFS guys have made a fix in the CVS Stable branch:

CVSROOT: /cvs/cluster
Module name: cluster
Branch: STABLE
Changes by: bmarzins@xxxxxxxxxxxxxx 2006-03-08 20:47:09

Modified files:
gfs-kernel/src/gfs: ops_inode.c

Log message:
Really gross hack!!!
This is a workaround for one of the bugs the got lumped into 166701. It
breaks POSIX behavior in a corner case to avoid crashing... It's icky.
when NFS opens a file with O_CREAT, the kernel nfs daemon checks to see
if the file exists. If it does, nfsd does the *right thing* (either
opens the file, or if the file was opened with O_EXCL, returns an
error). If the file doesn't exist, it passes the request down to the
underlying file system. Unfortunately, since nfs *knows* that the file
doesn't exist, it doesn't bother to pass a nameidata structure, which
would include the intent information. However since gfs is a cluster
file system, the file could have been created on another node after nfs
checks for it. If this is the case, gfs needs the intent information to
do the *right thing*. It panics when it finds a NULL pointer, instead
of the nameidata. Now, instead of panicing, if gfs finds a NULL
nameidata pointer. It assumes that the file was not created with _EXCL.

This assumption could be wrong, with the result that an application
could thing that it has created a new file, when in fact, it has opened
an existing one.


--
--
********************************************************************
* *
* Bas van der Vlies e-mail: basv@xxxxxxx *
* SARA - Academic Computing Services phone: +31 20 592 8012 *
* Kruislaan 415 fax: +31 20 6683167 *
* 1098 SJ Amsterdam *
* *
********************************************************************
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/