Re: [Patch][NFSv4]: Kernel panic (handle kernel NULL pointer) occurred in NFSv4 by the nfs_update_inode()

From: Vincent ROQUETA
Date: Mon Feb 26 2007 - 07:59:21 EST


Le lundi 29 janvier 2007 01:51, fj-shic a écrit :
> Hi, all
>
> Is there any good idea about this issue?
> I really think this is a problem in the NFSv4, which should be resolved
> in the latest kernel.

I have applied this patch to fix the bug I have seen in the same function (See
mail : "Client freeze with 64K pages" in the NFSv4 mailing list )

As result, it seems to have an other unfixed bug forward in the NFSv4 stack.
(in function nfs4_free_client, at unmount time), but at least the kernel does
not crash at the same point.

Since I am using a few used 64K pages configuration, I may "activate" unknown
bugs on wildly used NFS code.


Vincent


> Thank you for any information on this issue.
>
> > Hi, all
> >
> > I have been working on trying to evaluate the robustness of the NFS.
> > When I run the files creating test on NFSv4, the kernel panic (handle
> > kernel NULL pointer) happened during the test when
> > the /proc/sys/sunrpc/nfs_debug was set to 65535. The test I run is:
> > Mount: mount -t nfs4 192.168.0.21:/ /mnt
> > Test script:
> > A process creates and opens 1024 files on NFSv4
> > The process writes some test message to each file.
> > The process closes all the opening files and exit.
> > After the process exit, the test begins to umount the NFSv4
> > immediately.
> >
> > In normal case, I haven't found the problem in the test; but when
> > the /proc/sys/sunrpc/nfs_debug is set to debug mode(65535), the panic
> > problem will usually happens.
> >
> > The kernel panic log is as follows:
> >
> > NFS: nfs_update_inode(0:18/2049980 ct=1 info=0xe)
> > BUG: unable to handle kernel NULL pointer dereference at virtual address
> > 0000000c
> > printing eip: deb82605
> > *pde = 001f3067
> > Oops: 0000 [#1]
> > SMP
> > last sysfs file: /block/hda/removable
> > Modules linked in: nfs fscache nfsd ¡­. ¡­¡­.
> > CPU: 0
> > EIP: 0060:[<deb82605>] Not tainted VLI
> > EFLAGS: 00010246 (2.6.18-1.2747.el5 #1)
> > EIP is at nfs_update_inode+0xb0/0x692 [nfs]
> > ¡­skip¡­.
> > Process rpciod/0 (pid: 1865, ti=dd3ac000 task=ddd47550 task.ti=dd3ac000)
> > Stack: deba0609 ¡­ ¡­
> > Call Trace:
> > [<deb82c1f>] nfs_refresh_inode+0x38/0x1b0 [nfs]
> > [<dea9f602>] rpc_exit_task+0x1e/0x6c [sunrpc]
> > [<dea9f314>] __rpc_execute+0x82/0x1b3 [sunrpc]
> > [<c0433899>] run_workqueue+0x83/0xc5
> > [<c0434171>] worker_thread+0xd9/0x10c
> > [<c0436620>] kthread+0xc0/0xec
> > [<c0404d63>] kernel_thread_helper+0x7/0x10
> > DWARF2 unwinder stuck at kernel_thread_helper+0x7/0x10
> >
> > I have investigated the problem, and found the cause was the NULL
> > pointer i_sb->s_root in the "nfs_update_inode()" when the panic
> > happened.
> >
> > In the kernel, at the end of the file closing operation, the
> > nfs_file_release() will be invoked.
> >
> > I have found the operation process of the kernel is as follows:
> > nfs_file_release()
> >
> > |-- NFS_PROTO(inode)->file_release () |
> >
> > nfs_file_clear_open_context()
> >
> > put_nfs_open_context()
> >
> > | -- nfs4_close_state
> > |
> > | | -- nfs4_close_ops
> > |
> > | nfs4_do_close()
> > |
> > | nfs_update_inode()
> > |
> > | |-- inode == inode->i_sb-
> >>
> >>s_root->d_inode
> >>
> > | -- mntput(ctx_vfsmnt)
> >
> > atomic_dec_and_lock(&mnt->mnt_count,
> > &vfsmount_lock)
> >
> > After the asynchronous RPC call "nfs4_close_ops" is invoked in
> > put_nfs_open_context(), the kernel invokes the mntput(), and the mnt-
> >
> >>mnt_count is decreased. In my test, after the file closing operation,
> >
> > the sys_umont() was executed immediately. In normal case, the
> > asynchronous RPC call "nfs4_close_ops" can be completed quickly, and it
> > rarely ever happens that the sys_umount() is invoked before the end of
> > nfs_update_inode() operation. But when the sunrpc/nfs_debug is set to
> > debug mode, a lot of printk operations will be invoked in NFS. Due to
> > the lots of prink operations, the RPC call "nfs4_close_ops" will easily
> > be delayed, then it is possible that the sys_umount() is invoked before
> > the end of nfs_update_inode() operation. In the do_umont() (Because mnt-
> >
> >>mnt_count has been decreased, umount can be executed successfully), the
> >
> > sb->s_root will be set to NULL in the shrink_dcache_for_umount () which
> > is invoked by the nfs_kill_super().
> > Therefore, kernel panic occurred by the NULL pointer access when
> > nfs_update_date() accessed inode->i_sb->s_root.
> >
> > Because there is a possibility that sb->s_root of a super_block is set
> > to NULL with umount when nfs4_close_ops () is not finished in the NFS. It
> > is really necessary to check an empty pointer for the inode->i_sb-
> >
> >>s_root in the nfs_update_date().
> >
> > This problem still exists in the latest linux kernel.
> > To resolve this problem, I have made the patch below for the
> > linux-2.6.19.2. After the patch is applied, the problem can be resolved.
> > signed-off-by:ShiChao <shic@xxxxxxxxxxxxxxxx>
> >
> > --- old/fs/nfs/inode.c 2007-01-17 17:19:48.994020288 -0500
> > +++ new/fs/nfs/inode.c 2007-01-17 17:20:26.570307824 -0500
> > @@ -914,7 +914,7 @@
> >
> > server = NFS_SERVER(inode);
> > /* Update the fsid if and only if this is the root directory */
> > - if (inode == inode->i_sb->s_root->d_inode
> > + if ( inode->i_sb->s_root && inode == inode->i_sb->s_root->d_inode
> > && !nfs_fsid_equal(&server->fsid, &fattr->fsid))
> > server->fsid = fattr->fsid;


---------------------------------------------------------------------------------------------------------------
NFS: nfs_update_inode(0:15/7946242 ct=1 info=0xe)
VFS: Busy inodes after unmount of 0:15. Self-destruct in 5 seconds. Have a
nice day...
kernel BUG at fs/nfs/nfs4state.c:125!
umount[11087]: bugcheck! 0 [1]
Modules linked in: nfs sg scsi_dump perfbull mdm nfsd exportfs lockd eip ep
rms elan4 elan3 sunrpc elan qsnet ide_cd cdrom vfat fat dm_mod thermal
processor fan button uhci_hcd ehci_hcd e100 e1000 ext3 jbd aic7xxx
scsi_transport_spi sd_mod scsi_mod

Pid: 11087, CPU 3, comm: umount
psr : 00001010081a6018 ifs : 800000000000038a ip : [<a0000002083189e0>]
Not tainted
ip is at nfs4_free_client+0x1e0/0x200 [nfs]
unat: 0000000000000000 pfs : 000000000000038a rsc : 0000000000000003
rnat: 0000000000000073 bsps: 0000000000000073 pr : 0000000005552959
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0 : a0000002083189e0 b6 : a000000100065220 b7 : a0000001003781a0
f6 : 1003e0000000000000028 f7 : 1003e28f5c28f5c28f5c3
f8 : 1003e00000000000000fa f9 : 1003e0000000032000000
f10 : 1003e000000003b9aca00 f11 : 1003ed6bf94d5e57a42bd
r1 : a000000100a6e140 r2 : 0000000000004000 r3 : e000000004a20000
r8 : 0000000000000029 r9 : 0000000000000001 r10 : a0000001008884c0
r11 : 0000000000000000 r12 : e00000010f06fd60 r13 : e00000010f060000
r14 : 0000000000004000 r15 : a0000001008884c0 r16 : 0000000000004000
r17 : 0000000000000003 r18 : 0000000000000000 r19 : 0000000000000000
r20 : 0000000000000004 r21 : 0000000000000000 r22 : e00000010f060f54
r23 : 0000000000000001 r24 : e000000103140f5c r25 : e000000103140f40
r26 : e000000103140008 r27 : 0000000000000073 r28 : 0000000000000073
r29 : 6666666666666667 r30 : 00000000000000c8 r31 : 00000000000004b0

Call Trace:
[<a000000100010f60>] show_stack+0x80/0xa0
sp=e00000010f06f8b0 bsp=e00000010f0612e8
[<a000000100011870>] show_regs+0x890/0x8c0
sp=e00000010f06fa80 bsp=e00000010f0612a0
r32 : e00000010f06fce8 r33 : e00000010f06fa90 r34 : e00000010f06fbe0
r35 : e00000010f06fc18 r36 : e00000010f06fc30 r37 : e00000010f06fbf0
r38 : a000000100037b20 r39 : 0000000000000510 r40 : e00000010f06fab0
[<a000000100037b20>] die+0x1a0/0x4c0
sp=e00000010f06faa0 bsp=e00000010f061250
r32 : a0000001005e5c78 r33 : e00000010f06fba0 r34 : 0000000000000000
r35 : 0000000000000003 r36 : a000000100736ae0 r37 : a000000100736ad8
r38 : a000000100736adc r39 : a000000100037e80 r40 : 0000000000000309
r41 : a000000100a6e140
[<a000000100037e80>] die_if_kernel+0x40/0x60
sp=e00000010f06fac0 bsp=e00000010f061220
r32 : a0000001005e5c78 r33 : e00000010f06fba0 r34 : 0000000000000000
r35 : a000000100038550 r36 : 0000000000000289 r37 : e00000010f06fd38
[<a000000100038550>] ia64_bad_break+0x6b0/0x820
sp=e00000010f06fac0 bsp=e00000010f0611f0
r32 : 0000000000000000 r33 : e00000010f06fba0 r34 : a00000010000b780
r35 : 0000000000000002 r36 : a0000001000a2550
[<a00000010000b780>] ia64_leave_kernel+0x0/0x280
sp=e00000010f06fb90 bsp=e00000010f0611f0
[<a0000002083189e0>] nfs4_free_client+0x1e0/0x200 [nfs]
sp=e00000010f06fd60 bsp=e00000010f0611a0
r32 : e00000016f83a180 r33 : e00000016f83a1f8 r34 : 0000000000200200
r35 : 0000000000100100 r36 : a0000002083190b0 r37 : 0000000000000288
r38 : a0000002083041a0 r39 : a000000208332dc0 r40 : a000000208332dd8
r41 : 000000000000007d
[<a0000002083190b0>] nfs4_put_client+0x110/0x180 [nfs]
sp=e00000010f06fd60 bsp=e00000010f061178
r32 : e00000016f83a180 r33 : a0000002083482e8 r34 : a0000002083191a0
r35 : 0000000000000286 r36 : a0000002083041a0
[<a0000002083191a0>] destroy_nfsv4_state+0x80/0xa0 [nfs]
sp=e00000010f06fd60 bsp=e00000010f061150
r32 : e0000001017ffbf0 r33 : e0000001017ffbf8 r34 : a0000002082d7350
r35 : 0000000000000286 r36 : a0000002083041a0
[<a0000002082d7350>] nfs4_kill_super+0x90/0x160 [nfs]
sp=e00000010f06fd60 bsp=e00000010f061128
r32 : e000000103154200 r33 : e0000001017ffa80 r34 : a00000010014b900
r35 : 0000000000000308 r36 : a0000002083041a0
[<a00000010014b900>] deactivate_super+0x140/0x220
sp=e00000010f06fd60 bsp=e00000010f0610f8
r32 : e000000103154200 r33 : a000000100880cd0 r34 : a0000002083429c0
r35 : a000000100185060 r36 : 0000000000000389 r37 : a000000100a6e140
[<a000000100185060>] mntput_no_expire+0x160/0x260
sp=e00000010f06fd60 bsp=e00000010f0610c0
r32 : e000000103154200 r33 : e0000001040e4c80 r34 : a0000001006e9200
r35 : a0000001006e9300 r36 : a00000010015df20 r37 : 0000000000000205
r38 : a000000100a6e140
[<a00000010015df20>] path_release_on_umount+0x40/0x60
sp=e00000010f06fd60 bsp=e00000010f0610a0
r32 : e00000010f06fd78 r33 : a000000100186670 r34 : 0000000000000916
r35 : e0000001040e4c90
[<a000000100186670>] sys_umount+0x450/0x800
sp=e00000010f06fd60 bsp=e00000010f061010
r32 : 6000000000012db0 r33 : 0000000000000000 r34 : 600ffffffffef270
r35 : 600ffffffffef2f0 r36 : 4000000000006f10 r37 : c000000000000309
r38 : c000000000000611 r39 : 4000000000004650 r40 : 0000000000000000
r41 : 0000000000000000 r42 : 0000000000000000 r43 : e00000010f06fe30
r44 : a0000001006e9300 r45 : a000000208342a80 r46 : a000000100881e48
r47 : a00000010000b600 r48 : 0000000000000008 r49 : a000000100a6e140
[<a00000010000b600>] ia64_ret_from_syscall+0x0/0x20
sp=e00000010f06fe30 bsp=e00000010f061010
[<a000000000010640>] __kernel_syscall_via_break+0x0/0x20
sp=e00000010f070000 bsp=e00000010f061010
LKCD: CPU : 3 Reconfiguring memory bank information....
LKCD: This may take a while....
First cpu will configure dump header
LKCD: smp_num_cpus in header 16
CPU 3, 15 cpus remaining
Last cpu
LKCD: Dumping to block device (8,17) on CPU 3 ...
LKCD: Writing dump header
....LKCD:
18030 dump pages saved of 65536 each in pass 0
LKCD:
29090 dump pages skipped of 65536 each in pass 1
LKCD:
1000319 dump pages skipped of 65536 each in pass 2
LKCD:
0 dump pages skipped of 65536 each in pass 3

LKCD: CPU : 3 dumped 257 blocks of 262144 bytes each
LKCD: Dump Complete; 18030 dump pages saved.
The system will automatically reboot now !

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/