Re: [patch 7/9] mm: thrash detection-based file cache sizing

From: Johannes Weiner
Date: Thu Jan 16 2014 - 16:19:03 EST


On Wed, Jan 15, 2014 at 10:57:21AM +0800, Bob Liu wrote:
> On 01/15/2014 03:16 AM, Johannes Weiner wrote:
> > On Tue, Jan 14, 2014 at 09:01:09AM +0800, Bob Liu wrote:
> >> Good job! This patch looks good to me and with nice descriptions.
> >> But it seems that this patch only fix the issue "working set changes
> >> bigger than half of cache memory go undetected and thrash indefinitely".
> >> My concern is could it be extended easily to address all other issues
> >> based on this patch set?
> >>
> >> The other possible way is something like Peter has implemented the CART
> >> and Clock-Pro which I think may be better because of using advanced
> >> algorithms and consider the problem as a whole from the beginning.(Sorry
> >> I haven't get enough time to read the source code, so I'm not 100% sure.)
> >> http://linux-mm.org/PeterZClockPro2
> >
> > My patches are moving the VM towards something that is comparable to
> > how Peter implemented Clock-Pro. However, the current VM has evolved
> > over time in small increments based on real life performance
> > observations. Rewriting everything in one go would be incredibly
> > disruptive and I doubt very much we would merge any such proposal in
> > the first place. So it's not like I don't see the big picture, it's
> > just divide and conquer:
> >
> > Peter's Clock-Pro implementation was basically a double clock with an
> > intricate system to classify hotness, augmented by eviction
> > information to work with reuse distances independent of memory size.
> >
> > What we have right now is a double clock with a very rudimentary
> > system to classify whether a page is hot: it has been accessed twice
> > while on the inactive clock. My patches now add eviction information
> > to this, and improve the classification so that it can work with reuse
> > distances up to memory size and is no longer dependent on the inactive
> > clock size.
> >
> > This is the smallest imaginable step that is still useful, and even
> > then we had a lot of discussions about scalability of the data
> > structures and confusion about how the new data point should be
> > interpreted. It also took a long time until somebody read the series
> > and went, "Ok, this actually makes sense to me." Now, maybe I suck at
> > documenting, but maybe this is just complicated stuff. Either way, we
> > have to get there collectively, so that the code is maintainable in
> > the long term.
> >
> > Once we have these new concepts established, we can further improve
> > the hotness detector so that it can classify and order pages with
> > reuse distances beyond memory size. But this will come with its own
> > set of problems. For example, some time ago we stopped regularly
> > scanning and rotating active pages because of scalability issues, but
> > we'll most likely need an uptodate estimate of the reuse distances on
> > the active list in order to classify refaults properly.
> >
>
> Thank you for your kindly explanation. It make sense to me please feel
> free to add my review.

Thank you!

> >>> + * Approximating inactive page access frequency - Observations:
> >>> + *
> >>> + * 1. When a page is accessed for the first time, it is added to the
> >>> + * head of the inactive list, slides every existing inactive page
> >>> + * towards the tail by one slot, and pushes the current tail page
> >>> + * out of memory.
> >>> + *
> >>> + * 2. When a page is accessed for the second time, it is promoted to
> >>> + * the active list, shrinking the inactive list by one slot. This
> >>> + * also slides all inactive pages that were faulted into the cache
> >>> + * more recently than the activated page towards the tail of the
> >>> + * inactive list.
> >>> + *
> >>
> >> Nitpick, how about the reference bit?
> >
> > What do you mean?
> >
>
> Sorry, I mean the PG_referenced flag. I thought when a page is accessed
> for the second time only PG_referenced flag will be set instead of be
> promoted to active list.

It's cleared during rotation or not set on pages that came in through
readahead, but the first access sets the bit and the second access
activates it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/