Re: [Patch] Cleanup struct gendisk registration, 2.3.40-pre1

From: Guest section DW (dwguest@win.tue.nl)
Date: Sun Jan 16 2000 - 16:15:32 EST


On Sun, Jan 16, 2000 at 10:32:24AM -0800, Linus Torvalds wrote:
>
> On Sun, 16 Jan 2000, Guest section DW wrote:
> >
> > On the other hand, a problem that does occur is the aliasing
> > between block 15003 of hda and block 3 of hda2.
> > It is a misdesign of the buffer cache. [Wait until 2.5 before repairing.]
> > The current state of affairs causes strange things around fdisk
> > and various partition repair programs.
>
> Note that my suggestion for this particular thing (and all classes of
> these things) is very simple:
>
> - fsync the buffer cache at each close (we do this already, I think)
>
> - throw the buffer cache away completely whenever there are no open users
> of the block device.
>
> Yes, right now the "cache the floppy contents around opens" is really nice
> for things like "mtools" etc, which open and close the floppy device all
> the time and as long as we cache it globally we're cool.
>
> But the advantage is pretty small, and not really worth it.

Hmm. People who actually still use floppies would hate you, I suppose.
Hmm. And thinking about it, I would be unhappy too.
The first time one does a mount (of a 10 GB filesystem) this takes
a long time. Unmounting and doing a mount again immediately afterwards
takes negligible time because the data is still in core.

And this fsync doesnt really help at all, does it?
It remains the case that programs that access block 15003 of hda
and block 3 of hda2 will have alias problems as long as there is
at least some process that still has these devices open.
The problem really lies in the fact that the buffer cache never
considers the possibility that blocks on different devices may
in fact be the same block. Also with loop this causes problems.

Trivial solutions exist. For example, one can have buffer cache
at the disk level, not at the partition level.
That eliminates aliasing in a trivial way.

But if I have to invent a structure on the spot I would like
something more flexible. Having a cache only at the level
closest to the device may not be the best solution.

Look at the general situation. We have blocks in files on filesystems,
that live on a logical device. This logical device may itself
be a file in another filesystem (as in the case of loop) or
may be spread out over several disks (as in the case of md),
or live on another machine (as with nfs), so the general picture
is just a tree of namespaces and if one wants a block the request
bubbles up through all layers until some physical device is found.
Now a request is inserted into the queue.
At each layer there may be a cache. And at each layer there is a
cache coherency problem.

Why doesnt this bother us all the time - the fact that block 17
of the file foo is the same block as block 1017 on hda2?
Maybe because there almost always is a canonical path to
a given bit on the disk. Only in exceptional cases does one
access a file via /dev/hda2 instead of via the mounted ext2
filesystem. Only with loop we see frequent problems.

So, suppose we assign to every bit on the disk a maintainer,
someone who is responsible.
If I want to read from or write to /dev/hda then the entity
responsible for hda sees that there exist subentities hda1 .. hda8
and either determines itself which of these, if any, should be
interested, or just asks each of them. And /dev/hda2 sees that
it is mounted as a filesystem, and either flushes (for read) /
invalidates (for writes) all its buffers [namely in case there is
no code to determine more precisely which of these buffers might
be affected] or does the right thing with the unique buffer
that is implicated. Etc., down until the one who is responsible.

In other words, the normal case, where things are accessed via
the correct channels would stay precisely the same because the
reader/writer would be the responsible person.
The rare case, where people use a noncanonical access channel
would be correct but probably rather inefficient.

Fdisk would not get slower, because the partition blocks are not
part of a filesystem, so that /dev/hda itself is responsible.

So far some very loose ideas, invented on the spot.

Andries

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Jan 23 2000 - 21:00:14 EST