Re: journaling & VM (was: Re: reiserfs being part of the kernel: it'snot just the code)

From: Rik van Riel (riel@conectiva.com.br)
Date: Wed Jun 07 2000 - 09:27:56 EST

Next message: Roeland Th. Jansen: "Re: TLB IPI wait/oops"
Previous message: Richard B. Johnson: "Re: Why reassert ix86 NMI?"
In reply to: Stephen C. Tweedie: "Re: journaling & VM (was: Re: reiserfs being part of the kernel: it'snot just the code)"
Next in thread: Stephen C. Tweedie: "Re: journaling & VM (was: Re: reiserfs being part of the kernel: it'snot just the code)"
Reply: Stephen C. Tweedie: "Re: journaling & VM (was: Re: reiserfs being part of the kernel: it'snot just the code)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, 7 Jun 2000, Stephen C. Tweedie wrote:
> On Wed, Jun 07, 2000 at 10:23:35AM -0300, Rik van Riel wrote:
> > >
> > > There is no need for subcaches at all if all of the pages can be
> > > represented on the page cache LRU lists. That would certainly
> > > make balancing between caches easier.
> >
> > Wouldn't this mean we could end up with an LRU cache full of
> > unfreeable pages?
>
> Rik, we need the VM to track dirty pages anyway, precisely so
> that we can obtain some degree of write throttling to avoid
> having the whole of memory full of dirty pages.

*nod*

> If we get short of memory, we really need to start flushing dirty
> pages to disk independently of the task of finding free pages.

Indeed, page replacement and page flushing need to be
pretty much independant of each other, with the only
gotcha that page replacement is able to trigger page
flushing...

> > This could get particularly nasty when we have a VM with
> > active / inactive / scavenge lists... (like what I'm working
> > on now)
>
> Right, we definitely need a better distinction between different
> lists and different types of page activity before we can do this.

I think we'll want something like what FreeBSD has, mainly
because their feedback loop is really simple and has proven
to be robust.

1) active list
        This list contains the pages which are active, we
        age the pages, they can be mapped in processes
2) inactive list
        All pages here are ready to be reclaimed. We are
        free to reclaim the clean inactive page before the
        dirty ones (to delay/minimise IO) because no page
        ends up here unless we want to reclaim it anyway.
3) scavenge list (BSD calls this cache list, -EOVERLOADEDWORD)
        All pages here are clean and can be reclaimed for
        all page allocations which have __GFP_WAIT set. We
        keep only a minimal amount of free pages and most
        times __alloc_pages() is called we'll take a scavenge
        page instead.
4) free list
        Not much of a list, the current free page structure.
        We use the pages here for atomic allocations and, when
        we have too many free pages, for normal allocations.

The filesystem callbacks would be made for pages on the
inactive list, the filesystem (or shm, or swap subsystem)
is free to cluster any "eligable" pages with the page we
requested to be freed.

So if, eg., we request ext3 to flush page X, the filesystem
can make its own decision on if it wants to also flush some
other inactive (or even active) pages which are contiguous
on disk with the block page X is written to.

> > Question is, are the filesystems ready to play this game?
>
> With an address_space callback, yes --- ext3 can certainly find
> a transaction covering a given page.

This is what we need...

> I'd imagine reiserfs can do something similar, but even if not,
> it's not important if the filesystem can't do its lookup by
> page.

I don't necessarily agree on this point. What if our
inactive list is filled with pages the filesystem somehow
regards as new, and the filesystem will be busy flushing
the "wrong" (in the eyes of the page stealer) pages?

regards,

Rik

-- The Internet is not a network of computers. It is a network of people. That is its real strength.

Wanna talk about the kernel? irc.openprojects.net / #kernelnewbies http://www.conectiva.com/ http://www.surriel.com/

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/

Next message: Roeland Th. Jansen: "Re: TLB IPI wait/oops"
Previous message: Richard B. Johnson: "Re: Why reassert ix86 NMI?"
In reply to: Stephen C. Tweedie: "Re: journaling & VM (was: Re: reiserfs being part of the kernel: it'snot just the code)"
Next in thread: Stephen C. Tweedie: "Re: journaling & VM (was: Re: reiserfs being part of the kernel: it'snot just the code)"
Reply: Stephen C. Tweedie: "Re: journaling & VM (was: Re: reiserfs being part of the kernel: it'snot just the code)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Wed Jun 07 2000 - 21:00:28 EST