Re: File system compression, not at the block layer
From: Timothy Miller
Date: Fri Apr 23 2004 - 13:12:30 EST
Theodore Ts'o wrote:
On Fri, Apr 23, 2004 at 05:30:21PM +0000, Miquel van Smoorenburg wrote:
In article <408951CE.3080908@xxxxxxxxxxxxxx>,
Timothy Miller <miller@xxxxxxxxxxxxxx> wrote:
Well, why not do the compression at the highest layer?
[...] doing it transparently and for all files.
It's been done (see the above URL), but given how cheap disk space has
gotten, and how the speed of CPU has gotten faster much more quickly
than disk access has, many/most people have not be interested in
trading off performance for space. As a result, there are race
conditions in e2compr (which is why it never got merged into
mainline), and there hasn't been sufficient interest to either (a)
forward port e2compr to more recent kernels revisions, or (b) find and
fix the race conditions.
Well, performance has been my only interest. Aside from the embedded
space (which already uses cramfs or something, right?), the only real
benefit to FS compression is the fact that it would reduce the amount of
data that you have to read from disk. If your IDE drive gives you
50MB/sec, and your file compresses by 50%, then you get 100MB/sec
reading that file.
In a private email, one gentleman (who can credit himself if he likes)
pointed out that compression doesn't reduce the number of seeks, and
since seek times dominate, the benefit of compression would diminish.
SO... in addition to the brilliance of AS, is there anything else that
can be done (using compression or something else) which could aid in
reducing seek time?
Nutty idea: Interleave files on the disk. So, any given file will have
its blocks allocated at, say, intervals of every 17 blocks. Make up for
the sequential performance hit with compression or something, but to get
to the beginning of groups of files, seek time is reduced. Maybe.
Probably not, but hey. :)
Another idea is to actively fragment the disk based on access patterns.
The most frequently accessed blocks are grouped together so as to
maximize over-all throughput. The problem with this is that, well, say
boot time is critical -- booting wouldn't happen enough to get enough
attention so that its blocks get optimized (they would get dispersed as
a result of more common activities); but database access could benefit
in the long-term.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/