Re: very poor ext3 write performance on big filesystems?

From: Theodore Tso
Date: Mon Feb 18 2008 - 10:03:55 EST


On Mon, Feb 18, 2008 at 04:18:23PM +0100, Andi Kleen wrote:
> On Mon, Feb 18, 2008 at 09:16:41AM -0500, Theodore Tso wrote:
> > ext3 tries to keep inodes in the same block group as their containing
> > directory. If you have lots of hard links, obviously it can't really
> > do that, especially since we don't have a good way at mkdir time to
> > tell the filesystem, "Psst! This is going to be a hard link clone of
> > that directory over there, put it in the same block group".
>
> Hmm, you think such a hint interface would be worth it?

It would definitely help ext2/3/4. An interesting question is whether
it would help enough other filesystems that's worth adding.

> > necessarily removing the dir_index feature. Dir_index speeds up
> > individual lookups, but it slows down workloads that do a readdir
>
> But only for large directories right? For kernel source like
> directory sizes it seems to be a general loss.

On my todo list is a hack which does the sorting of directory inodes
by inode number inside the kernel for smallish directories (say, less
than 2-3 blocks) where using the kernel memory space to store the
directory entries is acceptable, and which would speed up dir_index
performance for kernel source-like directory sizes --- without needing
to use the spd_readdir LD_PRELOAD hack.

But yes, right now, if you know that your directories are almost
always going to be kernel source like in size, then omitting dir_index
is probably goint to be a good idea.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/