Re: fsync on large files

Colin Plumb (colin@nyx.net)
Fri, 19 Feb 1999 01:51:37 -0700 (MST)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Bernd Broermann: "Re: APM & Linux 2.2.1"
Previous message: David Miller: "Re: gcc-2.7.2.3-5: undefined reference to `__divdi3'"

Regarding the shared-metadata problem, here's a possible solution.

People have pointed out that you can just put all the shared metadata
on a single list and write it all out on each fsync.

The observation here is that syncing more data than required preserves
correctness; the only cost is to efficiency.

To achieve a similar effect with higher efficiency, you can use a Bloom
filter.

When you modify shared data based on inode n, you hash n into a hash
value with two bits set. Then you OR that hash into a mask word
associated with the block. (It might be in the otherwise unused inode
number slot, for example.)

When it comes time to fsync the inode, you walk the list of shared
blocks, testing if (hash & mask) == hash. (I.e. !(hash & ~mask).)
This is always true if the block contains modifications pertaining
to inode n, because the hash was ORed in. If the block does not
contain such modifications, it probably doesn't match.

You have to choose the number of bits set in the hash in combination
with the number of different inodes that will have interest in a block
to optimize the effectiveness of the filter, but any reasonable scheme
will get rid of a great many of the false alarms.

It's a fast, clever and elegant technique.

-- -Colin

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/

Next message: Bernd Broermann: "Re: APM & Linux 2.2.1"
Previous message: David Miller: "Re: gcc-2.7.2.3-5: undefined reference to `__divdi3'"