Re: [PATCH] Re: Solaris 100K TCP connections, good example? was:[Fwd:

Chuck Lever (cel@monkey.org)
Wed, 29 Sep 1999 15:30:29 -0400 (EDT)


On Wed, 29 Sep 1999, Alexander Viro wrote:
> On Wed, 29 Sep 1999, Chuck Lever wrote:
> > free_inodes() walks the in_use list looking for inodes with a zero
> > reference count. it's not clear to me how a zero reference count inode can
> > get onto the in_use list, though. i would think that iput is careful
> > enough to move inodes whose reference count goes to zero to the unused
> > list. all other references to the i_count field in the kernel i can find
>
> Wrong. unused is for freed inodes. Inodes in in_use with zero i_count can
> be immediately taken (without read_inode()). It _is_ needed - take it away
> and you will get a helluva lot of fs accesses. Think of soft pagefaults.

certainly we don't want to lose the "cache" nature of the inode cache. i'm
guessing the cache is implemented this way to reduce the amount of list
manipulation required when the system has enough inodes that it doesn't
need to recycle them.

one of the difficulties i had understanding this code is the
seeming interchangeability of the terms "unused" and "free". there are
more than simply the three states alluded to in the block comment at the
top of fs/inode.c:

1. in-use fs inode -- it's hashed, and on the in_use list
2. dirty in-use fs inode -- it's hashed, on the in_use list, and
in some super block's dirty inode list
3. in-use socket inode -- it's not hashed, but on the in_use list
4. zero count fs inode -- it's hashed, and on the in_use list, but
is a target for reclamation if i_nrpages is zero (*)
5. unused, or "free" inode -- it's not hashed, and not on the in_use
list, but is on the unused list

(*) since it's use count is zero, it can also be said that this type of
inode is "unused."

> iput() moves the thing to unused if it is unhashed (and thus can not be
> ever found via hash lookup). Real cleaning is left to free_inodes() when
> and if necessary.

the problem with free_inodes() is there is no LRU recycling. it just
picks up *all* the zero count inodes and moves them to the unused list.
that's as bad for file system performance as what you described above. i
think the behavior you want is to grab the N least recently used zero
reference count inodes, where N is a number similar in magnitude to the
number of inodes that are allocated in grow_inodes() via gfp.

then the only reason to keep the in_use list is to make invalidation fast.
for this, it might be better to walk the hash table, since there's no need
to invalidate unfindable inodes, like socket inodes. the hash table
method also has the benefit that you can momentarily release the inode
lock after you search each bucket.

- Chuck Lever

--
corporate:	<chuckl@netscape.com>
personal:	<chucklever@netscape.net> or <cel@monkey.org>

The Linux Scalability project: http://www.citi.umich.edu/projects/linux-scalability/

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/