Re: [XFS on bad superblock] BUG: unable to handle kernel NULLpointer dereference at 00000003

From: Fengguang Wu
Date: Thu Oct 10 2013 - 02:03:42 EST


On Thu, Oct 10, 2013 at 03:28:20PM +1100, Dave Chinner wrote:
> On Thu, Oct 10, 2013 at 11:38:34AM +0800, Fengguang Wu wrote:
> > On Thu, Oct 10, 2013 at 11:33:00AM +0800, Fengguang Wu wrote:
> > > On Thu, Oct 10, 2013 at 11:26:37AM +0800, Fengguang Wu wrote:
> > > > Dave,
> > > >
> > > > > I note that you have CONFIG_SLUB=y, which means that the cache slabs
> > > > > are shared with objects of other types. That means that the memory
> > > > > corruption problem is likely to be caused by one of the other
> > > > > filesystems that is probing the block device(s), not XFS.
> > > >
> > > > Good to know that, it would easy to test then: just turn off every
> > > > other filesystems. I'll try it right away.
> > >
> > > Seems that we don't even need to do that. A dig through the oops
> > > database and I find stack dumps from other FS.
> > >
> > > This happens in the kernel with same kconfig and commit 3.12-rc1.
> >
> > Here is a summary of all FS with oops:
> >
> > 411 ocfs2_fill_super
> > 189 xfs_fs_fill_super
> > 86 jfs_fill_super
> > 50 isofs_fill_super
> > 33 fat_fill_super
> > 18 vfat_fill_super
> > 15 msdos_fill_super
> > 11 ext2_fill_super
> > 10 ext3_fill_super
> > 3 reiserfs_fill_super
>
> The order of probing on the original dmesg output you reported is:
>
> ext3
> ext2
> fatfs
> reiserfs
> gfs2
> isofs
> ocfs2

There are effectively no particular order, because there are many
superblocks for these filesystems to scan.

for superblocks:
for filesystems:
scan super block

In the end, any filesystem may impact the other (and perhaps a later
run of itself).

> which means that no XFS filesystem was mounted in the original bug
> report, and hence that further indicates that XFS is not responsible
> for the problem and that perhaps the original bisect was not
> reliable...

This is an easily reproducible bug. And I further confirmed it in
two ways:

1) turn off XFS, build 39 commits and boot them 2000+ times

=> no single mount error

2) turn off all other filesystems, build 2 kernels on v3.12-rc3
v3.12-rc4 and boot them

=> half boots have oops

So it may well be that XFS is impacted by an early run of itself.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/