Re: Filesystem optimization..

Michael O'Reilly (michael@metal.iinet.net.au)
30 Dec 1997 14:17:24 +0800


ebiederm+eric@npwt.net (Eric W. Biederman) writes:
>
> MR> There are around 390,000 directories holding those files. Just how big
> MR> did you want to the directory cache to get!?
>
> The default size is 128 entries or making for a total of 256 entries
> in a two level cache, in the stable kernels. It might be worth
> increasing DCACHE_SIZE some. The development series seems to
> increase this to about 1024, and this is extended with chaining.

This still means that the cache hit rate would be pretty
appalling. Even on a 30/70 cache locality, you're missing an awful
lot.

> MR> So: Given that you _are_ going to get a cache miss, how do you speed
> MR> it up? The obvious way is to try and eliminate the seperate inode
> MR> seek.
>
> Another thing in the area of seeking that may be worth doing is
> checking to see if the kernel actually uses an elevator algorithm.
> I go the impression a while back that it does first come first serve
> for disk access. A little optimizing of the order (if it is cached),
> might help.

This only helps the overall thruput. It doesn't affect the latency on
a single open() (because the requests are completed before the next is
issued).

> But the truth is you should aim to have as much of your directory
> structure and inodes in RAM as you can. With only 3 million files and
> 1GB of RAM you could allocate 350 bytes per file. Of course something
> quite that simple would be silly, but at 3 million files you are not
> out of the range (except perhaps pocket book wise), or using RAM for a
> considerable cache, even on the intel architecture. With only 30,000
> directories you could probably keep them all in RAM let's see at 2K
> each that would be only 58M, a large but doable number.

With 10 times that number of directories, the intel arch starts losing
it. I've already got 512Meg of ram, but around 200meg is used by the
application (squid). So that leaves around 256meg for disk cache. To
get all the meta data cached, I'd need a machine with around 1.5 gig
of ram. Don't know about you, but motherboards holding that much are
hard to come buy, that much ram is bloody expensive, and PentiumPro's
can only cache a gig of ram anyway. :)

> If you want to play around, you could run my shmfs filesystem as a test.
> It has the deficiency that it loses everything at shutdown, but in
> every test I have run it seems to be as fast or faster than ext2. And
> it keeps all of it's inodes in RAM, and all of the page information.
> It's at http://www.npwt.net/~ebiederm/files/shmfs-0.0.020.tar.gz.
> And that's my shameless plug for beta testers :)

Sorry, don't have that much ram, and need to maintain the disk between
shutdowns.

Michael.