Re: (reiserfs) Re: LVM / Filesystems / High availability

Florian Lohoff (flo@quit.mediaways.net)
Wed, 24 Jun 1998 15:44:18 +0200


On Wed, Jun 24, 1998 at 12:37:32PM +0100, Stephen C. Tweedie wrote:
> Hi,
>
> On Tue, 23 Jun 1998 18:40:06 +0200, Florian Lohoff
> <flo@quit.mediaways.net> said:
>
> >> Virtual disks for redundancy or performance are just fine, but
> >> when it comes to filesystem sizing, the fs has to be actively involved
> >> in any change. Given that, we can actually implement the whole thing in
> >> the filesystem.
>
> > I would not like to see a special ext2 implentation for resizing,
> > holes, that the ext2 cares on "physical devices" and reorganization of
> > blocks.
>
> > The LVM approach with the "virtual block device" makes many things
> > much easier. You can keep filesystem code very simple,
>
> I believe things are actually _simpler_ if you keep it in the
> filesystem.
>
> > and the LVM code also isnt very complex. The only thing you might take
> > care on is the Block Allocation of the LVM which you might do as
> > complex and intelligent as you like but a bug in there will NOT cause
> > data to get lost or corrupt. The PEs will just not be there where they
> > should and it would be just an performance /reliability problem which
> > you might fix on the fly without any filesystem interaction. Also
> > creating a mirror, raid5, stripe of a simple filesystem on the fly
> > would be VERY easy, MUCH easier then doing it in the filesystem level.
>
> Mirroring, raid and striping are all things which should be done in the
> device layers, as I said in the first place. However, as soon as we
> start talking about online resizing (and it is specifically online
> resizing which I am talking about --- offline is an entirely separate
> issue), then the filesystem really has got to be involved and
> interacting with another component just makes it more complex.

Then you already inserted a kind of virtual block device for
raid levels of any kind. You have to distribute block accesses to
a set of disks, so why doing the ame thing twice ? If you
would have to resize a volume on a mirror set you still
have to talk to you "device" because it also has to resize ... no gain ...

> >> Miguel's prototype LVM stuff works by letting you mke2fs a new partition
> >> and then daisy-chain that new device on to the end of the existing
> >> filesystem, at run time, while it is all mounted. Removing such a
>
> > This is still (i am sure) very difficult and not that easy as it
> > sounds here.
>
> It IS easy. ext2fs already has multi-layered allocation; allocating
> inodes or blocks first has to search for a suitable block group, then
> for a free entry in that group. Adding extra code to the block group
> search to scan multiple bound filesystems is easy. Adding code to the
> inode or block lookup to partition the name spaces over those bound
> filesystems is easy. This work is _done_. It works just fine. It's
> the management issues which are harder --- working out how to deal with
> mounting filesystems; where do you specify the filesystem devices, in
> superblock or fstab; what to do if a device is inaccessible, and so on.

You have not to care on those issues on the filesystem if the LVM has already
got all those problems solved and working.

> > And still - you tight bound to physical devices (read: partitions,
> > drives etc)
>
> Yes. That's why I'd really like to know if this is a major problem. As
> far as I am concerned, simplicity really does dictate doing this just in
> the filesystem. We already get much independence from physical disks by

It doesnt. You have to write many more codelines to put this into
the filesystem as into another abstraction layer -> LVM. And still
you have to code more lines if you want to add this ability to any
future filesystem.

> having things such as raid in the block device layers. Is there any
> compelling need to be able to have such fine grained control over
> partition allocation as you get from an LVM, given that the ultimate aim
> with the filesystem-based solution should be able to let you (a) add a
> new partition to the bound filesystem set, and then (b) remove one or
> more of the original partitions, to achieve a similar effect? The
> biggest downside is that it is likely to be more expensive in terms of
> performance to do the removal and remapping from within the fs than from
> an LVM.

The issue i have with tighting the ext2 to devices is that in my
view Linux/Unix consists of reusable block encapsulated functions.
With this you have only those functionality in ext2 not in any other
filesystem type or even swap. I would hate to see this as in my
opinion is completely against any unix strategy.

> > I dont think that this complicates the things. We only need some
> > interaction between filesystems and devices. Like the filesystem
> > telling the device "I would like you to shrink by 4 GB, tell me if you
> > are able to do this" "Could you please shrink now by 4 GB, tell me
> > when ready" ...
>
> No no no. The shrinking of the filesystem is HARD. We have to

I was above describing (very rough) the communication between
filesystem and LVM (assuming we only shrink at the end which is enough
in my view)

> implement it whatever we do. Shrinking a bunch of blocks off of the end
> of the filesystem is no easier than shrinking a set from the middle of
> the filesystem, so if we have a filesystem composed of bound partitions,

This might be for ext2. There are many other filesystem which might
take advantage of a reziable block device.

> then removing one from the middle doesn't require any LVM magic to make
> it appear as if a block device is simply shrinking. Once the shrinking

What magic. You dont REMOVE PEs from the middle ... you replace them
with other PEs on other PVs or even network without any interaction
with the filesystem.

> is working, it is simple just to evict a partition from the bound set,
> without having to interact with any other software.

This is what we have right now. You might take partitions out of
service without interaction with the FS, or userspace.

> The filesystem-based solution also allows you to do this sort of
> management to ANY filesystem, regardless of whether or not you thought
> you'd need the feature when you first mounted it.

To any filesystem if you code all the extra stuff in it. For
EVERY SINGLE FILESYSTEM we will have to code very complex
"hole in the filesystem" things, not only "ok, take some blocks at the end".

Most filesystems are designed to be contigous, so it will be
much easier to cut off at the end or beginning.

> There are some _really_ impressive efforts going on right now, such as
> reiserfs, to develop new filesystems for the next generation. Ext2fs
> cannot afford to follow if that hurts stability.

Agree

> The result is that we've got a lot of good stuff coming for ext2fs,
> including major performance and reliability improvements such as the
> btree and journalling work, but massive overhauls of filesystem code
> deserve to be part of the next generation of filesystems, NOT ext2fs.

The last weeks i read a lot on filesystems and had a look at the complete
ext2fs code and ext2fs design. As i can see ext2 has many design
issues which makes it very difficult to get the flexibility of the
LVM with an filesystem on top able to grow/shrink.

> > This has led Microsoft to installing a 32bit OS into an 16 Bit FAT
> > partition. We do not have the need for quick return-of-invest and
> > commercial success, so we might choose the BEST TECHNICAL SOLUTION,
> > and we dont need to take compromises.
>
> I _am_ looking for the best technical solution here. However, amongst
> solutions of equal merit, I will take the simplest every time. For
> redundancy/striping, that means doing it in the LVM. For filesystem
> size management, I believe that means doing it in the fs.

Ok .. we have to combine this - shure - but why letting
the ext2 deal with different block devices etc. Why not letting
this be done by the LVM. Let the ext2 assume "we do have a (also virtual)
contigous block allocation sheme" which makes things easier.
Then build resizing to remove at the end which does everything
we need (With included LVM PE re-/moving). As i sad, we need an
extended communication for filesystem<>block device (for filesystem size
management)

If we want the flexibilty of an LVM with exchanging harddrives,
partial harddrives and raid levels etc. we need both. A filesystem
that flexible to grow/shrink and an logical block manager which
gives us block devices which release space on the physical devices
etc.

Flo

-- 
Florian.Lohoff@mediaWays.net			+49-5241-80-7085
aka flo@mini.gt.owl.de			@HOME	+49-5241-470566

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu