Re: [btrfs] BUG: unable to handle kernel NULL pointer dereference at0000000000000038

From: Fengguang Wu
Date: Mon Feb 10 2014 - 04:25:27 EST


On Sat, Feb 08, 2014 at 03:10:37PM -0500, Tejun Heo wrote:
> Hello, David, Fengguang, Chris.
>
> On Fri, Feb 07, 2014 at 01:13:06PM -0800, David Rientjes wrote:
> > On Fri, 7 Feb 2014, Fengguang Wu wrote:
> >
> > > On Fri, Feb 07, 2014 at 02:13:59AM -0800, David Rientjes wrote:
> > > > On Fri, 7 Feb 2014, Fengguang Wu wrote:
> > > >
> > > > > [ 1.625020] BTRFS: selftest: Running btrfs_split_item tests
> > > > > [ 1.627004] BTRFS: selftest: Running find delalloc tests
> > > > > [ 2.289182] tsc: Refined TSC clocksource calibration: 2299.967 MHz
> > > > > [ 292.084537] kthreadd invoked oom-killer: gfp_mask=0x3000d0, order=1, oom_score_adj=0
> > > > > [ 292.086439] kthreadd cpuset=
> > > > > [ 292.087072] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
> > > > > [ 292.087372] IP: [<ffffffff812119de>] pr_cont_kernfs_name+0x1b/0x6c
> > > >
> > > > This looks like a problem with the cpuset cgroup name, are you sure this
> > > > isn't related to the removal of cgroup->name?
> > >
> > > It looks not related to patch "cgroup: remove cgroup->name", because
> > > that patch lies in the cgroup tree and not contained in output of "git log BAD_COMMIT".

Sorry I was wrong here. I find that the above dmesg is for commit
4830363 which is a merge HEAD that contains the cgroup code.

The dmesg for commit 878a876b2e1 ("Merge branch 'for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs")
looks different, which hangs after the tsc line:

[ 2.428110] Btrfs loaded, assert=on, integrity-checker=on
[ 2.429469] BTRFS: selftest: Running btrfs free space cache tests
[ 2.430874] BTRFS: selftest: Running extent only tests
[ 2.432135] BTRFS: selftest: Running bitmap only tests
[ 2.433359] BTRFS: selftest: Running bitmap and extent tests
[ 2.434675] BTRFS: selftest: Free space cache tests finished
[ 2.435959] BTRFS: selftest: Running extent buffer operation tests
[ 2.437350] BTRFS: selftest: Running btrfs_split_item tests
[ 2.438843] BTRFS: selftest: Running find delalloc tests
[ 3.158351] tsc: Refined TSC clocksource calibration: 2666.596 MHz


> > It's dying on pr_cont_kernfs_name which is some tree that has "kernfs:
> > implement kernfs_get_parent(), kernfs_name/path() and friends", which is
> > not in linux-next, and is obviously printing the cpuset cgroup name.
> >
> > It doesn't look like it has anything at all to do with btrfs or why they
> > would care about this failure.
>
> Yeah, this is from a patch in cgroup/review-post-kernfs-conversion
> branch which updates cgroup to use pr_cont_kernfs_name(). I forget
> that cgrp->kn is NULL for the dummy_root's top cgroup and thus it ends
> up calling the kernfs functions with NULL kn and thus the oops. I
> posted an updated patch and the git branch has been updated.
>
> http://lkml.kernel.org/g/20140208200640.GB10975@xxxxxxxxxxxxxx
>
> So, nothing to do with btrfs and it looks like somehow the test
> appratus is mixing up branches?

Yes - I may do random merges and boot test the resulted kernels.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/