Re: stat benchmark

From: Carl Henrik Lunde
Date: Thu Apr 24 2008 - 17:42:19 EST


On Thu, Apr 24, 2008 at 10:59 PM, Soeren Sandmann <sandmann@xxxxxxxxxxx> wrote:
[ about programs reading all inodes after readdir ]

> Unfortunately, performance of that operation kinda sucks. On my system
> (ext3), it produces:
>
> c-24-61-65-93:~% sudo ./a.out
> Time to readdir(): 0.307671 s
> Time to stat 2349 files: 8.203693 s
>
> 8 seconds is about 80 times slower than what a user perceives as
> "instantly" and slow enough that we really should display a progress
> bar if it can't be fixed.
>
> So I am looking for ways to improve this.
>
> Under the theory that disk seeks are killing us, one idea is to add a
> 'multistat' system call that would allow statting of many files at a
> time, which would give the disk scheduler more to work with.

I have experimented with the same problem, and another idea is to
reorder the result from readdir, which I've gotten good results by doing.

This works because:
- For most filesystems there is a high correlation between the inode
number and the sector on the disk.

- Most programs like your example handle the files in the order that
they are returned from readdir

- The time spent sorting is very small compared to the disk seeks

There are several possible ways to implement this:

- reorder the dirents in the kernel for each getdents call

- reorderi the dirents in user space, for example by running
qsort in a libc wrapper

- in the file system, optimize the order before writing back a dirty directory

This does not only apply to programs only stating files, but also reading
them, such as indexing files, backups (tar), and Nautilus getting thumbnails
from JPGs.

--
Carl Henrik
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/