Re: [PATCH 1/2] vmscan: promote shared file mapped pages

From: Johannes Weiner
Date: Wed Nov 02 2011 - 12:32:01 EST


On Mon, Aug 08, 2011 at 04:18:11PM +0400, Konstantin Khlebnikov wrote:
> Pekka Enberg wrote:
> >Hi Konstantin,
> >
> >On Mon, Aug 8, 2011 at 2:06 PM, Konstantin Khlebnikov
> ><khlebnikov@xxxxxxxxxx> wrote:
> >>Commit v2.6.33-5448-g6457474 (vmscan: detect mapped file pages used only once)
> >>greatly decreases lifetime of single-used mapped file pages.
> >>Unfortunately it also decreases life time of all shared mapped file pages.
> >>Because after commit v2.6.28-6130-gbf3f3bc (mm: don't mark_page_accessed in fault path)
> >>page-fault handler does not mark page active or even referenced.
> >>
> >>Thus page_check_references() activates file page only if it was used twice while
> >>it stays in inactive list, meanwhile it activates anon pages after first access.
> >>Inactive list can be small enough, this way reclaimer can accidentally
> >>throw away any widely used page if it wasn't used twice in short period.
> >>
> >>After this patch page_check_references() also activate file mapped page at first
> >>inactive list scan if this page is already used multiple times via several ptes.
> >>
> >>Signed-off-by: Konstantin Khlebnikov<khlebnikov@xxxxxxxxxx>
> >
> >Both patches seem reasonable but the changelogs don't really explain
> >why you're doing the changes. How did you find out about the problem?
> >Is there some workload that's affected? How did you test your changes?
> >
>
> I found this while trying to fix degragation in rhel6 (~2.6.32) from rhel5 (~2.6.18).
> There a complete mess with >100 web/mail/spam/ftp containers,
> they share all their files but there a lot of anonymous pages:
> ~500mb shared file mapped memory and 15-20Gb non-shared anonymous memory.
> In this situation major-pagefaults are very costly, because all containers share the same page.
> In my load kernel created a disproportionate pressure on the file memory, compared with the anonymous,
> they equaled only if I raise swappiness up to 150 =)
>
> These patches actually wasn't helped a lot in my problem,
> but I saw noticable (10-20 times) reduce in count and average time of major-pagefault in file-mapped areas.
>
> Actually both patches are fixes for commit v2.6.33-5448-g6457474,
> because it was aimed at one scenario (singly used pages),
> but it breaks the logic in other scenarios (shared and/or executable pages)

I suspect that while saving shared/executable mapped file pages more
aggressively helps to some extent, the underlying problem is that we
tip the lru balance (comparing the recent_scanned/recent_rotated
ratios) in favor of file pages too much and in unexpected places.

For mapped file, we do:

add to lru: recent_scanned++
cycle: recent_scanned++
[ activate: recent_scanned++, recent_rotated++ ]
[ deactivate: recent_scanned++, recent_rotated++ ]
reclaim: recent_scanned++

while for anon:

add to lru: recent_scanned++, recent_rotated++
reactivate: recent_scanned++, recent_rotated++
deactivate: recent_scanned++, recent_rotated++
[ activate: recent_scanned++, recent_rotated++ ]
[ deactivate: recent_scanned++, recent_rotated++ ]
reclaim: recent_scanned++

As you can see, even a long-lived file page tips the balance to the
file list twice: on creation and during the used-once detection. A
thrashing file working set as in Konstantin's case will actually be
seen as a lucrative source of reclaimable pages.

Tipping the balance with each new file LRU page was meant to steer the
reclaim focus towards streaming IO pages and away from anonymous pages
but wouldn't it be easier to just not swap above a certain priority to
have the same effect? With enough used-once file pages, we should not
reach that priority threshold.

Tipping the balance for inactive list rotation has been there from the
beginning, but I don't quite understand why. It probably was not a
problem as the conditions for inactive cycling applied to both file
and anon equally, but with used-once detection for file and deferred
file writeback from direct reclaim, we tend to cycle more file pages
on the inactive list than anonymous ones. Those rotated pages should
be a signal to favor file reclaim, though.

Here are three (currently under testing) RFC patches that 1. prevent
swapping above DEF_PRIORITY-2, 2. treat inactive list rotations to be
neutral wrt. the inter-LRU balance, and 3. revert the file list boost
on lru addition.

The result looks like this:

file:

add to lru:
[ activate: recent_scanned++, recent_rotated++ ]
[ deactivate: recent_scanned++, recent_rotated++ ]
reclaim: recent_scanned++

mapped file:

add to lru:
cycle: recent_scanned++, recent_rotated++
[ activate: recent_scanned++, recent_rotated++ ]
[ deactivate: recent_scanned++, recent_rotated++ ]
reclaim: recent_scanned++

anon:
add to lru: recent_scanned++, recent_rotated++
reactivate: recent_scanned++, recent_rotated++
deactivate: recent_scanned++, recent_rotated++
[ activate: recent_scanned++, recent_rotated++ ]
[ deactivate: recent_scanned++, recent_rotated++ ]
reclaim: recent_scanned++

As you can see, this still behaves under the assumption that refaults
from swap are more costly than from the fs, but we keep considering
anonymous pages when the file working set is thrashing.

What do reclaim people think about this?

Konstantin, would you have the chance to try this set directly with
your affected workload if nobody spots any obvious problems?

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/