On Dec. 31, 2008, 17:57 +0200, James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:On Wed, 2008-12-31 at 17:19 +0200, Boaz Harrosh wrote:Andrew Morton wrote:This is really a reflection of the whole problem with the OSD paradigm.On Tue, 16 Dec 2008 17:33:48 +0200There are a few main reasons.
Boaz Harrosh <bharrosh@xxxxxxxxxxx> wrote:
We need a mechanism to prepare the file system (mkfs).Doing mkfs in-kernel is unusual. I don't think the above description
I chose to implement that by means of a couple of
mount-options. Because there is no user-mode API for committing
OSD commands. And also, all this stuff is highly internal to
the file system itself.
- Added two mount options mkfs=0/1,format=capacity_in_meg, so mkfs/format
can be executed by kernel code just before mount. An mkexofs utility
can now be implemented by means of a script that mounts and unmount the
file system with proper options.
sufficiently helps the uninitiated understand why mkfs cannot be done
in userspace as usual. Please flesh it out a bit.
- There is no user-mode API for initiating OSD commands. Such a subsystem
would be hundredfold bigger then the mkfs code submitted. I think it would be
hard and stupid to maintain a complex user-mode API just for creating
a couple of objects and writing a couple of on disk structures.
In theory, a filesystem on OSD is a thin layer of metadata mapping
objects to files. Get this right and the storage will manage things,
like security and access and attributes (there's even a natural mapping
to the VFS concept of extended attributes). Plus, the storage has
enough information to manage persistence, backups and replication.
The real problem is that no-one has actually managed to come up with a
useful VFS<->OSD mapping layer (even by extending or altering the VFS).
Every filesystem that currently uses OSD has a separate direct OSD
speaking interface (i.e. it slices out the block layer to do this and
talks directly to the storage).
I suppose this could be taken to show that such a layer is impossibly
complex, as you assert, but its lack is reflected in strange looking
design decisions like in-kernel mkfs. It would also mean that there
would be very little layered code sharing between ODS based filesystems.
I think that we may need to gain some more experience to extract the
commonalities of such file systems. Currently we came up with the
lowest possible denominator the osd initiator library that deals
with command formatting and execution, including attrs, sense status,
and security.
To provide a higher level abstraction that would help with "administrative"
tasks like mkfs and the like we already tossed an idea in the past -
a file system that will represent the contents of an OSD in a namespace,
for example: partition_id / object_id / {data, attrs / ..., ctl / ...}.
Such a file system could provide a generic mapping which one could
use to easily develop management applications for the OSD. That said,
it's out of the scope of exofs which focuses mostly on the filesystem
data and metadata paths.