Re: (reiserfs) Re: LVM / Filesystems / High availability

Theodore Y. Ts'o (tytso@mit.edu)
Wed, 24 Jun 1998 13:51:49 -0400

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Theodore Y. Ts'o: "Re: Y2K"
Previous message: Niels Kristian Bech Jensen: "[pre-2.1.107] kmod.c changes breaks printing."

Date: Wed, 24 Jun 1998 13:17:33 +0200
From: Florian Lohoff <flo@quit.mediaways.net>

There are logically no holes. If you have to take a PE out of service in
the middle you multiple choices:

1. Shrink the filesystem by the size of one PE and exchange
the PE in the middle by the PE got free at the end
(Assuming Grow-/Shrinking always happens at the end, otherwise
you *NEED* PE/LV aware FSes)
2. Replace the PE by another PE in the Volume Group beeing
completely transparent to the Filesystem.

OK, suppose I have a 54 terrabyte filesystem, with all PE's in use, and
I need to remove a 9 gigabyte disk from the middle of the filesystem,
because it's failing.

I can't do (2) because I don't have a spare PE to use.

I could do (1) but that would mean temporarily copying *more* data to
the failing 9 gigabyte disk as part of the compaction process, thus
putting that data at risk. The increase disk activity would also make
it more likely for the disk to fail completely. Once the filesystem has
been compacted you now have space to exchange the PE, but that involves
needless 9 gigabyte copy on top of the overhead of the filesystem
compaction.

If the filesystem is LVM aware, and is using structed block addresses,
then all it needs to do is to stop allocating blocks in that particular
PE, and start vacating blocks and inodes out of the failing disk to
others, on-line. This is faster and more robust.

It doesnt matter where you delete PE/Physical Disks. You will always
be able to just replace them transparently, be the PEs getting
free at the end by filesystem shrink or with PEs on other PVs in
the same Volume Group.

This is fine in theory, but it involves at lot of needless disk copies,
which if the disk which you want to decomission is failing, is the last
thing you want to do.

IMHO it is easier to start from the beginning having thought on
these nifty features before, then implement it cleanly. As i found
when i was searching for it after Linux-Kongress was, that there
are many filesystem projects for linux which have nice festures,
and performance tweaks. Id like to just keep compatibility
for newer Linux version to old filesystems (ext2) and begin
with a new stable, feature (overloaded) fs in nextgenrationLinux.

There are plenty of filesystem implementation efforts which are trying
to do this. Reiserfs, dts, lfs, among others. In my experience there
are far more filesystem projects started than actually finish, although
I'll be the first to acknowledge that there are some pretty sharp people
working on some of the other filesystems. In the true Bazaar model,
we'll see which approach results in a stable, performant and robust
filesystem first. It should be an interesting experiment.

- Ted

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu

Next message: Theodore Y. Ts'o: "Re: Y2K"
Previous message: Niels Kristian Bech Jensen: "[pre-2.1.107] kmod.c changes breaks printing."