Volume management on Linux with the ext2fs.

Miguel de Icaza (miguel@nuclecu.unam.mx)
22 Apr 1997 14:43:46 -0500


Hello,

I have been considering adding to Linux volume management
capabilities, in the spirit of the IRIX volume management, where it is
possible for a file system to span multiple devices.

The idea is to allow users to plug more disks into a working file
system without forcing them to split their files nor forcing them into
backing up, raid-0 their disks and restore the backup. The idea would
be to just let a user plug a disk into the system, run some setup
utility and add this to an existing ext2fs.

My motivation comes from reading the XFS paper. XFS is optimized
for huge filesystems, huge files, huge directory contents and huge
file systems. They do things like metadata logging and besides all
this, XFS can talk to the undelying volume manager and thus you can
increase the volume size at runtime.

So, I have been thinking about this a little over the past days,
and how cool would it be to have our loved ext2 file system do this
kind of things. So, I came up with the following hack^H^H^H^Hidea,
which would let us implement some kind of volume management facilities
for the ext2fs with minimal changes to the system.

[ You can skip the following 2 paragraphs if you know some
basic ext2fs disk layout details ]

Ext2fs smallest addressing unit is the block. The block has
block_size bytes on it (this one, you choose at mke2fs time). Now,
the cool thing is that the blocks are a 32-bit number, which is, well,
big enough for most purposes (this means, ext2fs can handle devices
ranging from 1024*(2^32) to 4096*(2^32) bytes). So, for most disk
drives available at popular stores this should be more than enough.

Ok, next, the ext2fs divides the disk in a bunch of groups. Each
one of these block groups has: its copy of the superblock, group
descriptors, block bitmap, inode bitmap, inode table, and the data
blocks. Now, this is cool thing #2. Basically, this means, that
every block group can survive independenly from the rest of the file
system, mhm, well, ok, kind of.

The idea is that we can abuse the block_number all over the file
system and allow the block_number to exceed the number of blocks in
a single device. We would use the block_number to identify which
device holds the information at hand.

So, when a user adds a new disk to an existing file system, the
ext2fs code will get the number of available blocks on this extra disk
and the number of block groups, and add this information to the number
of blocks/block groups he knows about.

We can even implement this as an optional feature, for example,
instead of using the ``sb->u.ext2_sb.s_groups_count'' variable
directly, we should use a macro, lets say EXT2_GROUP_COUNT(sb) which
would be defined as ``sb->u.ext2_sb.s_groups'' for the regular ext2fs
case and as some different thing for the case where we have volume
management turned on.

Of course, we need to take care of recording all of this
information on the superblocks, enhacing the existing ext2 utilities
and so on.

Comments on this? Is this proposal completely foolish?

Have a nice day,
Miguel.

-- 
miguel@roxanne.nuclecu.unam.mx     
The GNU Midnight Commander: http://mc.blackdown.org/mc
Linux/SPARC project:        http://www.geog.ubc.ca/sparclinux.html