Re: Questions: bforget & forgotten pages

Kenneth Albanowski (kjahds@kjahds.com)
Fri, 28 Aug 1998 01:28:36 -0400 (EDT)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: pacman: "Re: Anybody out there? (ETXTBSY)"
Previous message: John Kennedy: "Re: 2.1.118 PCMCIA Oops"

I'm afraid this didn't touch on my questions, and that is purely my fault.
To explain a bit more: as part of my work on uClinux, I am exploring how
the buffer cache allocation can be controlled in a system with a very
limited set of buffer devices. (Specifically, just the ramdisk and some
romdisks that usually can bypass the buffer cache.)

For the most part, this simply means leaving the buffer cache alone, and
just not provoking activities that would cause it to change size. I could
easily put in limits to its maximum size, but for the moment am more
interested in letting it "float" at a reasonable size, without extensive
modifications to its allocation methods.

On Thu, 27 Aug 1998, Bill Hawes wrote:

> Kenneth Albanowski wrote:
>
> > For nefarious reasons of my own, I'm interested in keeping buffer usage
> > constant, in a system with only the rd block device. I've started out by
> > populating the ram disk with an un-holey filesystem image, and from there
> > it does a good job of not using more or less memory -- with one exception.
> >
> > I finally figured out what was causing a bizarre "leak": the bforget
> > function which is invoked by ext2/truncate.c -- and _only_ by that code,
> > at least in 2.0.33. In the process of truncating a file, bforget is
> > invoked on the discarded buffers, which removes the protected bits (thus
> > turning the pages loose from the ram disk), and resets the device numbers
> > to zero without moving the page to the free list. The former is just
> > annoying, but the latter seems to have the fairly bizarre effect of
> > leaving "zombie" clean buffers around that, as far as I can tell, will not
> > be reclaimed or reused by anything short of try_to_free_buffer, or the
> > inner machinations of refill_freelist.
>
> The try_to_free_buffer() and refill_free_list are the normal mechanisms by
> which buffer pages are reclaimed, so you don't have to be worried about
> losing pages.

Sorry, I didn't mean to imply that I am "losing" pages in any sense,
merely that the buffer allocation is increasing, and I find that fairly
unusual because ext2/truncate is the _only_ operation that uses bforget,
and thus increases buffer allocation in this manner. As I said, it took
some time to figure out why the buffer cache was inflating.

In this context, try_to_free_buffer and refill_free_list aren't
sufficient, as memory never fills up. (And since this is an extremely
memory limited system, filling up memory is something I wish to avoid in
the first place.) Perhaps this is an unwise idea, and I should just bump
down the buffer cache limits so that it starts reclaiming much earlier,
but this seems a less elegant approach. (And the simplest approach would
also only trigger refill_free_list.)

> There are some dangers to using RAM disks apart from just a boot time -- the
> system memory management uses the percentage of buffer memory as a guide for
> when to free buffers, but doesn't know that protected buffers can't be freed.
> This might lead to situations where the system has reclaimed virtually all of
> the general-purpose buffers, but still thinks there's lots of buffer memory.

In this case, the RAM disk is the only writable block device around, so
the memory issues are moot.

> The RAM disk buffers also make the buffer hash chains longer, increasing the
> time to search for other general-purpose buffers.

Agreed, but this is also moot.

> > So, two questions: can I replace bforget in the ext2 code with something
> > else, brelse perhaps? (It's not obvious whether their semantics are
> > identical.) Secondly, am I mistaken about forgotten pages, and are they
> > reclaimable via some process I've missed?
>
> The semantics are most certainly not identical -- bforget is used when you
> want to get rid of a buffer's identity, as its block on disk has just be
> released. The truncate code goes to pains to ensure that the buffer won't get
> accidentally reused before it can be forgotten.

Yes, I understand the semantics aren't identical, but I also understand
that some of the issues may not be relevant (for example: is part of the
logic there to prevent zeroed indirect inodes from being written back to
disk? In that case, it's definitely irrelevant to a RAM disk). I'm trying
to _understand_ exactly what bforget does, and why it "loses" pages in
this manner, and I don't find it obvious.

You say that we want to get rid of the buffer's identity, but I don't
understand the rationale for this. (The "identity" of a buffer is the disk
media, and that most certainly has not changed.) This _isn't_ a question
of multiple-access (because bforget is only invoked when the buffer usage
count is 1), and isn't a question of reading in changed media (because
ext2 never bypasses the buffer cache). To my mind that only leaves
race-conditions as the explanation, and I'm afraid they aren't very
obvious at the best of times.

To ask a simple question: why can't these forgotten buffers immediately be
put onto the free list? All the work that has been done on them (zeroing
the device, removing them from the hash table) prevents them from being
easily found, so once they are forgotten they are as good as free, they
just aren't free yet.

Again, the only explanation for this behaviour that I can think of is for
there to be a race condition, where some process could obtain a reference
to the buffer before or during the bforget, so we rely on
try_to_free_page() to eventually GC the forgotten buffer.

Thanks for your time,
Ken

-- Kenneth Albanowski (kjahds@kjahds.com, CIS: 70705,126)

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html

Next message: pacman: "Re: Anybody out there? (ETXTBSY)"
Previous message: John Kennedy: "Re: 2.1.118 PCMCIA Oops"