Re: 2.6.34-rc3: simple du (on a big xfs tree) triggers oom killer[bisected: 57817c68229984818fea9e614d6f95249c3fb098]

From: Dave Chinner
Date: Tue Apr 06 2010 - 19:12:17 EST


On Tue, Apr 06, 2010 at 04:52:57PM +0200, Hans-Peter Jansen wrote:
> Hi Dave,
>
> On Tuesday 06 April 2010, 01:06:00 Dave Chinner wrote:
> > On Mon, Apr 05, 2010 at 01:35:41PM +0200, Hans-Peter Jansen wrote:
> > > >
> > > > Oh, this is a highmem box. You ran out of low memory, I think, which
> > > > is where all the inodes are cached. Seems like a VM problem or a
> > > > highmem/lowmem split config problem to me, not anything to do with
> > > > XFS...

[snip]

> Dave, I really don't want to disappoint you, but a lengthy bisection session
> points to:
>
> 57817c68229984818fea9e614d6f95249c3fb098 is the first bad commit
> commit 57817c68229984818fea9e614d6f95249c3fb098
> Author: Dave Chinner <david@xxxxxxxxxxxxx>
> Date: Sun Jan 10 23:51:47 2010 +0000
>
> xfs: reclaim all inodes by background tree walks

Interesting. I did a fair bit of low memory testing when i made that
change (admittedly none on a highmem i386 box), and since then I've
done lots of "millions of files" tree creates, traversals and destroys on
limited memory machines without triggering problems when memory is
completely full of inodes.

Let me try to reproduce this on a small VM and I'll get back to you.

> diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
> index 52e06b4..a76fc01 100644
> --- a/fs/xfs/linux-2.6/xfs_super.c
> +++ b/fs/xfs/linux-2.6/xfs_super.c
> @@ -954,14 +954,16 @@ xfs_fs_destroy_inode(
> ASSERT_ALWAYS(!xfs_iflags_test(ip, XFS_IRECLAIM));
>
> /*
> - * We always use background reclaim here because even if the
> - * inode is clean, it still may be under IO and hence we have
> - * to take the flush lock. The background reclaim path handles
> - * this more efficiently than we can here, so simply let background
> - * reclaim tear down all inodes.
> + * If we have nothing to flush with this inode then complete the
> + * teardown now, otherwise delay the flush operation.
> */
> + if (!xfs_inode_clean(ip)) {
> + xfs_inode_set_reclaim_tag(ip);
> + return;
> + }
> +
> out_reclaim:
> - xfs_inode_set_reclaim_tag(ip);
> + xfs_ireclaim(ip);
> }

I don't think that will work as expected in all situations - the
inode clean check there is not completely valid as the XFS inode
locks aren't held, so it can race with other operations that need
to complete before reclaim is done. This was one of the reasons for
pushing reclaim into the background....

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/