Re: slab corruption in test9 (NFS related?)

From: Andrew Morton
Date: Sun Nov 09 2003 - 22:01:50 EST


Burton Windle <bwindle@xxxxxxxx> wrote:
>
> Forwarding to LKML as per Neil Brown.
>
>
> ---------- Forwarded message ----------
> Date: Sun, 9 Nov 2003 17:21:47 -0500 (EST)
> From: Burton Windle <bwindle@xxxxxxxx>
> To: Trond Myklebust <trond.myklebust@xxxxxxxxxx>, neilb@xxxxxxxxxxxxxxx
> Subject: nfsd caused slab corruption in test9
>
> Hello. I'm not quite sure whom to send this to, so my appologizes if you
> aren't the right people.
>
> Two linux boxes, both running Debian Testing, both x86 with 2.6.0-test9.
> The nfs client was running the nhfsstone stress test against an ext3 NFS
> exported share. After about an hour (I forgot it was running), the console on the
> server woke up and spit this out:
>
>
> Slab corruption: start=c9df6bcc, expend=c9df6c8b, problemat=c9df6bcc
> Last user: [<c017f87e>](d_callback+0x1e/0x40)
> Data: 6A**********************************************************************************************************************************************************************************************A5
> Next: 71 F0 2C .7E F8 17 C0 A5 C2 0F 17 00 00 00 00 08 00 00 00 3C 4B 24 1D 00 00 00 00 0A 00 00 00 slab error in check_poison_obj(): cache
> `dentry_cache': object was modified after freeing Call Trace:
> [<c0143653>] check_poison_obj+0xf3/0x180
> [<c0181d19>] d_alloc+0x19/0x360
> [<c01457da>] kmem_cache_alloc+0x13a/0x180
> [<c0181d19>] d_alloc+0x19/0x360
> [<c0174a3e>] cached_lookup+0x5e/0x80
> [<c0175ebb>] __lookup_hash+0x5b/0xa0
> [<c0175f10>] lookup_hash+0x10/0x20
> [<c0175f75>] lookup_one_len+0x55/0x80
> [<c01e7d05>] nfsd_lookup+0xa5/0x5e0
> [<c01f020b>] nfsd3_proc_lookup+0x8b/0x100
> [<c01e5147>] nfsd_dispatch+0xc7/0x1a0
> [<c0313a7f>] svc_process+0x49f/0x640
> [<c01e4c47>] nfsd+0x307/0x740
> [<c01e4940>] nfsd+0x0/0x740
> [<c0107429>] kernel_thread_helper+0x5/0x1c
>

Someone did a dput() of a freed dentry. This is quite possibly an nfsd bug.

There's also http://bugme.osdl.org/show_bug.cgi?id=1497, in which someone
did a scribble on the buffer_head slab's metadata. Could be unrelated.

There's a debug patch in -mm which should catch the rogue dput(). Perhaps
you could include it in your testing. It is at

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.0-test9/2.6.0-test9-mm2/broken-out/atomic_dec-debug.patch

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/