Re: Is there something wrong here?

Linus Torvalds (torvalds@transmeta.com)
16 Jan 1999 09:06:36 GMT


In article <77pj0o$9sk$1@palladium.transmeta.com>,
Linus Torvalds <torvalds@transmeta.com> wrote:
>
>Basically, I _suspect_ that what is going on is somebody waiting on a IO
>entity just because the IO entity is locked - never mind the fact that
>it is perfectly up-to-date, and it is only locked for IO because it is
>busy getting written out.

Duh. I _stared_ at it right in the face, and still I missed it.

The buffer.c thing serializes _way_ too much. In particular, imagine
that you want to read a buffer (for doing indirect block lookup, or
reading a directory or something), and you call "bread()" in the
filesystem.

Now, assume that you have the buffer in memory and up-to-date, so you'd
expect to get the data immediately, wouldn't you?

You won't. "bread()" will call "getblk()", which will then call
"get_hash_table()", which in turn will find the buffer for you, and then
it will _wait_ on it - because it's being written out to disk. Duh.
Double duh.

It makes no sense at all - we should just return the locked buffer, and
read the data from there (or even _write_ it - even if we're in the
middle of writing it to disk that's fine, because we'll mark it dirty
and later write out the correct version).

This problem must have been there since day one. I'm almost afraid of
looking at linux-0.01 because I suspect it will be there too. No wonder
some people complain that their machines are less than responsive while
they are writing out data.

Anyway, as I wrote above I've made this same mistake multiple times:
it's really easy to go overly careful, and use a "locked" condition to
mean that it can't be used at all, when it often really only serializes
operations that change the _state_ of the buffer rather than any of the
_contents_.

(So in this case, a buffer being "locked" doesn't mean that we can't use
it, it only means that we can't do physical IO on it - trying to do
multiple concurrent IO on the buffer at the same time would be stupid
and silly, but that doesn' tmean that using the _data_ is stupid and
silly).

If anybody wants to try whether this is why they have problems during
disk IO, the fix should be fairly straightforward. Look in the file
fs/buffer.c, function get_hash_table(), you'll find something like

...
bh = find_buffer(dev,block,size);
if (!bh)
break;
bh->b_count++;
bh->b_lru_time = jiffies;
if (!buffer_locked(bh))
break;
...

and make the thing break out if the buffer is uptodate, ie change the
second if-statement to

if (buffer_uptodate(bh) || !buffer_locked(bh))
break;

does that make your performance less jerky?

NOTE NOTE NOTE! I did only a _very_ quick scan of this, and I _think_
the above is safe, but I would suggest some care. If you haven't backed
up lately, I would only like to point out that if you do the above,
everything may work out beautifully, and you may get a really hot date
tomorrow. Who knows? God moves in mysterious ways.

But you should also keep in mind that it's 1AM when I'm writing this,
it's a Friday evening, and you have no idea whether I'm so dead drunk
that I can't even sit straight, much less make sense of kernel sources.
I'd really like to know if the above makes any change to any behaviour,
but you'd better be aware that the above function is the "heart" of the
Linux buffer management, and changing it is not to be done lightly.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/