Re: (reiserfs) Re: LVM / Filesystems / High availability

Florian Lohoff (flo@quit.mediaways.net)
Thu, 25 Jun 1998 11:38:51 +0200

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Manfred Petz: "Re: 2.0.34: Warning: kfree_skb passed an skb still on a list"
Previous message: Albert D. Cahalan: "Re: Resume/Suspend (was Re: LVM / Filesystems / High availability"

Jehova,
On Wed, Jun 24, 1998 at 04:22:03PM +0100, Stephen C. Tweedie wrote:
> Hi,
>
> On Wed, 24 Jun 1998 15:44:18 +0200, Florian Lohoff
> <flo@quit.mediaways.net> said:
>
> > On Wed, Jun 24, 1998 at 12:37:32PM +0100, Stephen C. Tweedie wrote:
>
> >> Mirroring, raid and striping are all things which should be done in
> >> the device layers, as I said in the first place. However, as soon as
> >> we start talking about online resizing , then the filesystem really
> >> has got to be involved and interacting with another component just
> >> makes it more complex.
>
> > Then you already inserted a kind of virtual block device for
> > raid levels of any kind. You have to distribute block accesses to
> > a set of disks, so why doing the ame thing twice ?
>
> Because the block device is the right place to do performance-related
> stuff like striping, whereas all the filesystem needs to do is to spread
> out its data over all the available space --- which it does already.

So the already implemented linear mode is not an block-device issue
more a filesystem one and should be removed ?

> > If you would have to resize a volume on a mirror set you still have to
> > talk to you "device" because it also has to resize ... no gain ...
>
> Not if you are using the mechanism we've got for ext2fs. If you want to
> extend a mirrored filesystem, then create a new mirrored device out of
> the extra space and add that on to the filesystem.

When i think partitioning via partition table like nearly all OSes have
i find 1050 4MB partition after i used the system 2 years ... great.

> >> It IS easy. ext2fs already has multi-layered allocation; allocating
> >> inodes or blocks first has to search for a suitable block group, then
> >> for a free entry in that group. Adding extra code to the block group
> >> search to scan multiple bound filesystems is easy.
>
> > You have not to care on those issues on the filesystem if the LVM has
> > already got all those problems solved and working.
>
> The filesystem *already* deals with it at the block group basis.
> Extending it to deal with multiple devices is trivial (and done). The
> LVM cannot do it alone; the filesystem has got to be substantially
> involved if we start resizing its data structures. The filesystem, on
> the other hand, _can_ do it alone, with no extra complexity and very
> little extra code. We can implement a solution which requires no
> device-level support, and we will have that in 2.3.

Nobody says we dont need the filesystem to resize. But the filesystem
CANT do it without the block device as otherwise (in case of shrink)
you will not be able to reuse the free space, or (in case of grow)
you will overwrite another partition as this might be at the
end of the existing one. ext2/3 WONT be able to resize without
any block-device communication

> There are in fact fundamental properties of the ext2fs filesystem
> structure which make it hard to resize beyond certain limits as a single
> filesystem. In particular, for every 250MB of filesystem or so, you get
> an extra block of group descriptor information after the superblock, and
> those descriptor blocks are statically allocated. Growing those tables
> is HARD, potentially requiring relocating large amounts of space on the
> disk. Growing the filesystem by linking in new partitions is much
> easier.

How do you describe the new partitions then ? Will you add another
layer above the block-groups called disk-groups or something ..

> That's just a property of the historical ext2fs design.

This is why i mentioned it would be better to drop this concept,
of doing resizing into the ext2.

> >> As far as I am concerned, simplicity really does dictate doing this
> >> just in the filesystem.
>
> > It doesnt. You have to write many more codelines to put this into
> > the filesystem as into another abstraction layer -> LVM.
>
> You keep saying this, but that doesn't make it true! The extra work
> required to do it in the filesystem is small. If you want to do it in
> the LVM, then you _still_ need to do it in the filesystem, so you don't
> gain anything.

You NEED resizing in the filesystem correct. But the filesystem will
not have to deal with non-linear block-adressing (e.g. holes)
and different physical devices as this is done in the LVM.

> > And still you have to code more lines if you want to add this ability
> > to any future filesystem.
>
> That's going to be necessary anyway; none of the filesystems right now
> have any native support for run-time resizing.

Right. Assuming we will add this capability to the ext2 to grow/shrink
online we dont have to care on "disk-groups" and non-linear
adressing which makes things more complicated.

> > The issue i have with tighting the ext2 to devices is that in my
> > view Linux/Unix consists of reusable block encapsulated functions.
> > With this you have only those functionality in ext2 not in any other
> > filesystem type or even swap.
>
> I really don't follow the argument. You seem to be asserting that we
> can solve dynamic filesystem resizing in the block device layer without
> any explicit filesystem support. That is simply not true.

No ... but i say if you will build this into ext2 we wont have
it for any other filesystem to use. If we have a "toolkit"
in form of the LVM it makes it more easy to implement for
other filesystems like reiser, dt, or lj.

> >> The filesystem-based solution also allows you to do this sort of
> >> management to ANY filesystem, regardless of whether or not you
> >> thought you'd need the feature when you first mounted it.
>
> > To any filesystem if you code all the extra stuff in it. For EVERY
> > SINGLE FILESYSTEM we will have to code very complex "hole in the
> > filesystem" things, not only "ok, take some blocks at the end".
>
> By "ANY filesystem" I was referring to any existing ext2fs filesystem.
> Personally, I like the thought that existing users will be able to grow
> their volumes without reformatting.

I like this too ... but i also want different filesystem types
to be able to grow/shrink. Think of having ext2 for my system disk
and any kind of log-structured thing for my news-base.

> The second point is that coding what you describe as "hole in the
> filesystem" is absolutely trivial compared with the whole nightmare
> issue of shrinking filesystems. That is to say, there is a massive
> amount of work you need to do to support _any_ filesystem shrinking,
> wherever in the filesystem you remove the blocks. It's essentially
> equivalent to supporting online defragmentation plus reallocation of
> inode numbers. That is _not_ easy. Given that we need to do all of
> this hard work in the filesystem anyway, it is trivial to complete the
> job and allow the fs to merge or abandon individual partitions without
> any LVM support.

This is thought in ext2 dimensions. Other filesystem implementation
strategys might be resizeable with another 100 lines of code
whereas ext might need 5000. This is what i say. Its an design
issue which now should be thought over as we have a working
LVM. BTW: I dont wont a partition table with 30 entrys because
i resized a bit around ... An LVM makes this invisible and automatic
for me ...

> > Most filesystems are designed to be contigous, so it will be
> > much easier to cut off at the end or beginning.
>
> That is simply not true, I'm afraid. Revoking the validity of ANY data
> blocks in the filesystem is hard. The issue of where abouts you revoke
> the data from within the filesystem is an inconsequential problem in
> comparison.

? Assuming you have a linear block addressing in the filesystem
it makes it very easy to cut of at the end as you dont need to
correct ANY block pointer except to blocks moved to free the
space, but this is problem whereever you have to move blocks.

Flo

-- Florian.Lohoff@mediaWays.net +49-5241-80-7085 aka flo@mini.gt.owl.de @HOME +49-5241-470566

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu

Next message: Manfred Petz: "Re: 2.0.34: Warning: kfree_skb passed an skb still on a list"
Previous message: Albert D. Cahalan: "Re: Resume/Suspend (was Re: LVM / Filesystems / High availability"