Idea: Network RAID == Distributed FS

Glenn.McGrath (Glenn.McGrath@jcu.edu.au)
Sun, 21 Mar 1999 08:21:25 +1000


Hi people, disclamer first, im new to this list and have a lot to learn
on kernels and distributed filsystems(DFS).

The way i see it network technology is developing at a much faster rate
than local media, currently. (bleeding edge technolgy) gigabit ethernet
(120MB/s), is faster than ulta-wide-scsi2 (80MB/s max), so getting data
simultaneously from a dozen servers (ide drives or whatever) on a
switched Gbit network is fastest way to get data into your comp.
Comparisons can be made with different technology, but im sure you get
the point.

I realise that linux already has distributed filesystem support, with
coda (xFS on the horizon too), but i thought this may be a different
approach, that could have some advantages over other DFS's, i thought
it to be worthy of discussion anyway, so let me know if im wrong.

By combining two existing features, network filesystem*(eg NFS, SMB?),
and RAID , it should be possible to setup a raid system over a network
instead of over /dev/hdxx or /dev/sxx (whatever scsi is) with minimal
effort as compared to doing a DFS from scratch, or without kernel level
support . The advantages i see in this method are.

1) NFS and RAID already exist, its always good to re-use existing time
proven code.
2) NFS is filesystem independent, i.e. you could setup NFS to work on a
ext2, fat32 or any linux supported filesystem, so each node of
distributed filesystem could be a different filesystem type.
3) Because it exists(partly?) in the kernel it gains some magical
performanc increase (well, i dunno myself, but thats how legend has it,
although from what ive seen great debate goes into what belongs in
kernel/user space)

Problems
1) Latency would be a problem for small files, the common method used by
most filesystems is to group files into chunks and send the whole lot,
of course small files could be stored/cached locally.
2) I would think raid 2/3 would have been handy, this doesnt currently
exist with linux's raid implementation because the reliability of local
drives isnt really questionable. But raid 4/5 would be the fastest.

Im sure there ae some problems im overlooking, it would take a while to
nut out the fine details of how it should work, e.g. coordinating what
operations should/shouldnt proceed reguarding permissions and security
etc. But maybe this has even been tried before and im not aware of it.

* I distinguish a network filesystem to be the client/server model,
whereas a distributed filesytem has multiple clients/multiple servers

Thanks

Glenn McGrath
Glenn.McGrath@jcu.edu.au

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/