RE: [NFS] [PATCH] NFS: fix client hang due to race condition

From: Lever, Charles
Date: Wed Jul 06 2005 - 21:17:04 EST


> The flags field in struct nfs_inode is protected by the BKL. The
> following two code paths (there may be more, but my test program only
> hits these two) modify the flags without obtaining the lock:
>
> nfs_end_data_update
> nfs_release
> nfs_file_release
> __fput
> fput
> filp_close
> sys_close
> syscall_call
>
> nfs_revalidate_mapping
> nfs_file_write
> do_sync_write
> vfs_write
> sys_write
> syscall_call
>
> Running multiple instances of a simple program [1] that opens, writes
> to, and closes NFS mounted files eventually results in the programs
> hanging on an SMP system (see kernel .config [3]).
>
> I've been testing this with 100 instances of the program:
> $ ./breaknfs 100 &
>
> Usually within 10 minutes, all instances of breaknfs will hang. They
> disappear from the output of 'top' and there is no NFS
> activity between
> the client and server.

[ sysrq output snipped... ]

> I've reproduced this bug on 2.6.11.10, 2.6.12-mm2, and 2.6.13-rc2.
>
> With my patch against 2.6.13-rc2 below, I ran 100 instances
> of breaknfs
> with this patch for 14 hours and I was unable to get the
> client to hang.

i agree this is a problem.

but instead of using heavyweight synchronization, why not convert the
NFS_INO flags into atomic bitops? i have a patch that does that; would
need to be ported to the latest kernels and tested to see if it
addresses the problem.

nick, are you interested in trying it out?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/