Re: [PATCH -mm] vmscan: make mapped executable pages the firstclass citizen

From: Wu Fengguang
Date: Fri May 08 2009 - 04:16:45 EST


On Fri, May 08, 2009 at 04:44:10AM +0800, Andrew Morton wrote:
> On Thu, 7 May 2009 17:10:39 +0200
> Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
>
> > > +++ linux/mm/nommu.c
> > > @@ -1224,6 +1224,8 @@ unsigned long do_mmap_pgoff(struct file
> > > added_exe_file_vma(current->mm);
> > > vma->vm_mm = current->mm;
> > > }
> > > + if (vm_flags & VM_EXEC)
> > > + set_bit(AS_EXEC, &file->f_mapping->flags);
> > > }
> >
> > I find it a bit ugly that it applies an attribute of the memory area
> > (per mm) to the page cache mapping (shared). Because this in turn
> > means that the reference through a non-executable vma might get the
> > pages rotated just because there is/was an executable mmap around.
>
> Yes, it's not good. That AS_EXEC bit will hang around for arbitrarily
> long periods in the inode cache. So we'll have AS_EXEC set on an
> entire file because someone mapped some of it with PROT_EXEC half an
> hour ago. Where's the sense in that?

Yes that nonsense case is possible, but should be rare.

AS_EXEC means "this is (likely) an executable file".
It has broader coverage in both space and time:

- it protects the whole file instead of only the text section
- it allows to further protect the many executables/libraries that
typically runs short in time but maybe frequently, eg. ls, cat,
git, gcc, perl, python, ...

But none of the above cases are as important in user experience as the
currently running code, so here goes the new patch (which applies after
vmscan: report vm_flags in page_referenced()).

Thanks,
Fengguang
---
vmscan: make mapped executable pages the first class citizen

Protect referenced PROT_EXEC mapped pages from being deactivated.

PROT_EXEC(or its internal presentation VM_EXEC) pages normally belong to some
currently running executables and their linked libraries, they shall really be
cached aggressively to provide good user experiences.

CC: Elladan <elladan@xxxxxxxxxx>
CC: Nick Piggin <npiggin@xxxxxxx>
CC: Johannes Weiner <hannes@xxxxxxxxxxx>
CC: Christoph Lameter <cl@xxxxxxxxxxxxxxxxxxxx>
CC: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx>
Acked-by: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Acked-by: Rik van Riel <riel@xxxxxxxxxx>
Signed-off-by: Wu Fengguang <fengguang.wu@xxxxxxxxx>
---
mm/vmscan.c | 33 +++++++++++++++++++++++++++++++--
1 file changed, 31 insertions(+), 2 deletions(-)

--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -1233,6 +1233,7 @@ static void shrink_active_list(unsigned
unsigned long pgscanned;
unsigned long vm_flags;
LIST_HEAD(l_hold); /* The pages which were snipped off */
+ LIST_HEAD(l_active);
LIST_HEAD(l_inactive);
struct page *page;
struct pagevec pvec;
@@ -1272,8 +1273,13 @@ static void shrink_active_list(unsigned

/* page_referenced clears PageReferenced */
if (page_mapping_inuse(page) &&
- page_referenced(page, 0, sc->mem_cgroup, &vm_flags))
+ page_referenced(page, 0, sc->mem_cgroup, &vm_flags)) {
pgmoved++;
+ if ((vm_flags & VM_EXEC) && !PageAnon(page)) {
+ list_add(&page->lru, &l_active);
+ continue;
+ }
+ }

list_add(&page->lru, &l_inactive);
}
@@ -1282,7 +1288,6 @@ static void shrink_active_list(unsigned
* Move the pages to the [file or anon] inactive list.
*/
pagevec_init(&pvec, 1);
- lru = LRU_BASE + file * LRU_FILE;

spin_lock_irq(&zone->lru_lock);
/*
@@ -1294,6 +1299,7 @@ static void shrink_active_list(unsigned
reclaim_stat->recent_rotated[!!file] += pgmoved;

pgmoved = 0; /* count pages moved to inactive list */
+ lru = LRU_BASE + file * LRU_FILE;
while (!list_empty(&l_inactive)) {
page = lru_to_page(&l_inactive);
prefetchw_prev_lru_page(page, &l_inactive, flags);
@@ -1316,6 +1322,29 @@ static void shrink_active_list(unsigned
__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
__count_zone_vm_events(PGREFILL, zone, pgscanned);
__count_vm_events(PGDEACTIVATE, pgmoved);
+
+ pgmoved = 0; /* count pages moved back to active list */
+ lru = LRU_ACTIVE + file * LRU_FILE;
+ while (!list_empty(&l_active)) {
+ page = lru_to_page(&l_active);
+ prefetchw_prev_lru_page(page, &l_active, flags);
+ VM_BUG_ON(PageLRU(page));
+ SetPageLRU(page);
+ VM_BUG_ON(!PageActive(page));
+
+ list_move(&page->lru, &zone->lru[lru].list);
+ mem_cgroup_add_lru_list(page, lru);
+ pgmoved++;
+ if (!pagevec_add(&pvec, page)) {
+ spin_unlock_irq(&zone->lru_lock);
+ if (buffer_heads_over_limit)
+ pagevec_strip(&pvec);
+ __pagevec_release(&pvec);
+ spin_lock_irq(&zone->lru_lock);
+ }
+ }
+ __mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+
spin_unlock_irq(&zone->lru_lock);
if (buffer_heads_over_limit)
pagevec_strip(&pvec);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/