Re: [RFC] Per-inode metadata cache.

Stephen C. Tweedie (sct@redhat.com)
Mon, 18 Oct 1999 23:02:02 +0100 (BST)


Hi,

On Mon, 18 Oct 1999 13:26:45 -0400 (EDT), Alexander Viro
<viro@math.psu.edu> said:

>> You can't even know which is the inode Y that is using a block X without
>> reading all the inode metadata while the block X still belongs to the
>> inode Y (before the truncate).

> WTF would we _need_ to know? Think about it as about memory caching. You
> can cache by virtual address and you can cache by physical address. And
> since we have no aliasing here...

We still have the underlying problem. The hash lists used for buffer
cache lookup are only part of the buffer cache data structures. The
other part --- the buffer lists --- are independent of the namespace you
are using. If you have a dirty buffer in an inode's metadata cache,
then yes, you still need to deal with the aliasing left when that data
is deallocated unless you explicitly revoke (bforget) that buffer as
soon as it is deallocated.

>> Right now you know only that the stale buffer can't came from inode
>> metadata as we have the race prone slowww trick to loop in polling mode
>> inside ext2_truncate until we'll be sure bforget will run in hard mode.

> And? Don't use bread() on metadata. It should never enter the buffer hash,
> just as buffer_heads of _data_ never enter it. Leave bread() for
> statically allocated stuff.

It's write, not read, which is the problem: the danger is that you have
a dirty metadata buffer which is deleted and reallocated as data, but
the buffer is still dirty in the buffer cache and can stomp on your
freshly allocated data buffer later on.

The cache coherency isn't the problem as much as _disk_ coherency.
cache coheren

>> And even if you know such information you should do a linear search in the
>> list that's slower than an hash lookup anyway (or you can start slowly
>> unmapping all the other stale buffers even if you are not interested
>> about, so you may block more than necessary). Doing a per-object hash is
>> not an option IMHO.

> You are kinda late with it - it's already done for data. The question
> being: do we ever need lookups by physical address (disk location)?
> AFAICS it is _not_ needed.

>> > d) Moving the second class into the page cache will cause problems
>> >with bigmem stuff. Besides, I have the reasons of my own for keeping those

>> I can't see these bigmem issues. The buffer and page-cache memory is not
>> in bigmem anyway. And you can use bigmem _wherever_ you want as far as you
>> remeber to fix all the involved code to kmap before read/write to
>> potential bigmem memory. bigmem issue looks like a red-herring to me.

> Right. And you may want different policies for data and metadata - the
> latter should always be in kvm. AFAICS that's what SCT refered to.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/