Re: [RFC][PATCH] try not to let dirty inodes fester

From: Dave Hansen
Date: Tue Oct 05 2010 - 11:26:03 EST


On Sat, 2010-10-02 at 21:32 +1000, Dave Chinner wrote:
> On Fri, Oct 01, 2010 at 12:14:49PM -0700, Dave Hansen wrote:
> >
> > I've got a bug that I've been investigating. The inode cache for a
> > certain fs grows and grows, desptite running
> >
> > echo 2 > /proc/sys/vm/drop_caches
> >
> > all the time. Not that running drop_caches is a good idea, but it
> > _should_ force things to stay under control. That is, unless the
> > inodes are dirty.
>
> What's the filesystem, and what's the test case?

It's GPFS, which is a binary blob to me, unfortunately. I've seen some
of the same behavior with ext3, but only after changing some of the
dirty writeout tunables to absurd values. I think the complication with
GPFS in particular is that it doesn't use Linux's buffer cache. We
don't trigger any of the page-based dirty watermarks since no _pages_
are being dirtied.

I've seen it happen when creating or touching large numbers of empty
files. Yuri (cc'd) has seen it happen when mmap()'ing files but not
modifying them, since noatime is not set.

The original case that we were seeing was an NFS server serving up a
GPFS filesystem.

> > I think I'm seeing a case where the inode's dentry goes away, it
> > hits iput_final(). It is dirty, so it stays off the inode_unused
> > list waiting around for writeback.
>
> Right - it should be on the bdi->wb->b_dirty list waiting to be
> expired and written back or already of the expired writeback queueÑ
> and waiting to be written again.
>
> > Then, the periodic writeback happens, and we end up in
> > wb_writeback(). One of the first things we do in the loop (before
> > writing out inodes) is this:
> >
> > if (work->for_background && !over_bground_thresh())
> > break;
>
> Sure, but the periodic ->for_kupdate flushing should be writing
> any inode older than 30s and should be running every 5s. hence the
> background writeback aborting should not be affecting the cleaning
> of dirty inodes. Hence I don't think this is the problem your are
> looking for.

Yeah, I think you're right. I missed that call site when I was going
through it.

> Without knowing what filesystem or what you are doing to grow the
> inode cache, it's pretty hard to say much more than this....

Thanks for looking at it. I'm trying to see if I can reproduce any of
this with any of the in-tree fs's.

-- Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/