Re: PATCH: Raw device IO for 2.1.131

Khimenko Victor (khim@sch57.msk.ru)
Wed, 16 Dec 1998 11:01:54 +0300 (MSK)


15-Dec-98 19:03 you wrote:
> So, someone mentioned that Linux is all about "technical" issues, not what
> people want but purely on a 'this is good and this is bad technology' basis.

Not exactly so but more like look on all solutions for existion problems from
viewpoint "how this thing will affect our life in next 10 years?". Where it's
touches API that is: temporary hacks in code is ok till they could be easily
removed in the future.

> OK, a possible "technical" problem is, I want to have 2 linux boxes(or more)
> connected to the same scsi disks. (twin tailed or what have you). I have
> running 2 instances of the same software both accessing those disks. For
> obvious reasons, load balancing, spread load of jobs, and failover, if a
> node fails, at least the other instance still has access to the disk and can
> RECOVER the data. Because my logfiles are also 'shared' so I can access the
> other node's logfiles and recover from that.

I'm could not see how this all will work without specially designed software
and hardware !

> We cannot use a filesystem, since we do not have a real distributed filesystem
> yet (note we need Performance here. so don't come with coda and what have you...)

Key word is "YET".

> How do we solve this on every OS other than linux ? We use raw devices, since
> when we do a write, we know it's on the disk (there are no issues with scsi
> controller cache...) All committed writes are captured and whenever something
> needs to be recovered we have all the data needed in the logfiles(also on raw
> devices so it contains all data).

This is clearly ugly hack and not acceptable as long term solution IMNSHO.

> The way the disks are shared depends on the hardware architectures, we do not
> really care, as long as all nodes can access the disks, even when a node
> fails, the disk local to that node should still remain visible to the others,
> like with RVSDs, another server takes over control.

> This is a widely used setup, very important for availability and failover. And
> the same architecture lends itself well for loadbalancing and stuff as well.
> But... obviously it is of no importance to some folks... or at least. its easy
> to say : No we don't want that (coz we don't need it ourselves?) but give an
> alternative solution ? short/longterm ? I haven't heard any yet in the entire
> thread that is going on. Too bad...

Since this problem is not raised in this thread yet :-)) IMO the only clear
solution would be changes in ext2fs or may be special filesystem.

> There is also the fact that raw io for databases IS faster. Whatever type
> filesystem you design, doesn't matter since we know which blocks to write
> where. An index entry points to a specific block/file/slot so its easy to
> calculate the offset in the 'file' ;) And except for full table scans, the
> data is spread allover the place, so read-ahead into buffercache doesn't do
> didley squad in that case.

But you still should keep track of space used for different tables in
database :-)) This is EXACTLY filesystem work. Of course you could make
internal filesystem in database but of course much more clear way is to
fix/extend existing filesystem.

> Whether Raw dev make it or not is not the issue for me (altho I think Stephen
> did a cool job;)... but I would like to hear solutions to the above, if raw
> dev ain't the way to go technology-wise... if you cannot give a solution, then
> what keeps you from implementing raw dev as a short term solution (2.3) ?

There are lesson from story of computing: a short term solutions will be there
till the end of system. Even Window98 has support for ugly FCB file access back
from MS/PC DOS 1.x of early 80th !!! In other words: there are NO such things
as "short-term solution". If such thing will be added in 2.3 then in 2010 year
we'll try to get rid of this "short-term solution" and in 2020 this crap will
be still in kernel. In short: ANY SHORT-TERM SOLUTIONS ARE NOT ALLOWED for
mainstream kernel!

> Right now, we do not have the possibility to run parallel server on linux, not
> because we do not 'want' it but because linux does not offer us a solution.
> And the other OS's do. Clustering is not just 'beowulf'... there is more than
> that. DLM's and all that stuff is doable.. but the disk access is what we
> miss.

No. Not because linux does not offer your solution. Just since you demand
solution in mainstrem kernel. Clear solution is not designed and ugly solutions
are not allowed. But if you really want "short-term solution" then you always
could develop module/patch for kernel to add all needed functions. If this will
be separate patch this will be enough for short-term solution of you problem
but there are will be no problems with mainstream kernel pollution. Why
short-term solution could not be removed ? Story always the same:
1. Someone develop "temporary hack" to solve some problem "quick and dirty".
2. Hack is added as "short-term solution" in system.
3. A lot of developers use this hack unaware about it's "temporary" status.
4. Now hack is stuck and could not be removed without herculian effort (if at all).
If short-term hack is not in mainstream kernel but in module associated with
Oracle then it's possible that someone will use this module for his(her)
programs but in this case he(she) could blame only yourself when module will
disapper in time.

> all flames > /dev/null.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/