Re: [PATCH 9/10] mm/memcg: move lru_lock into lruvec

From: Konstantin Khlebnikov
Date: Thu Feb 23 2012 - 09:21:07 EST


Hugh Dickins wrote:
On Wed, 22 Feb 2012, Konstantin Khlebnikov wrote:
Hugh Dickins wrote:
On Wed, 22 Feb 2012, Konstantin Khlebnikov wrote:
Hugh Dickins wrote:

I'll have to come back to think about your locking later too;
or maybe that's exactly where I need to look, when investigating
the mm_inline.h:41 BUG.

pages_count[] updates looks correct.
This really may be bug in locking, and this VM_BUG_ON catch it before
list-debug.

I've still not got into looking at it yet.

You're right to mention DEBUG_LIST: I have that on some of the machines,
and I would expect that to be the first to catch a mislocking issue.

In the past my problems with that BUG (well, the spur to introduce it)
came from hugepages.

My patchset hasn't your mem_cgroup_reset_uncharged_to_root protection,
or something to replace it. So, there exist race between cgroup remove and
isolated uncharged page put-back, but it shouldn't corrupt lru lists.
There something different.

Yes, I'm not into removing cgroups yet.

Ok, my v3 patchset can deal with cgroups removing. At least I believe. =)

I was implemented isolated pages counter.
Seems like overhead isn't fatal and can be reduced.
Plus these counters can be used not only as reference counters,
they provides useful statistics for reclaimer.


I've got it: your "can differ only on lumpy reclaim" belief, first
commented in 17/22 but then assumed in 20/22, is wrong: those swapin
readahead pages, for example, may shift from root_mem_cgroup to another
mem_cgroup while the page is isolated by shrink_active or shrink_inactive.

Ok, thanks.


Patch below against the top of my version of your tree: probably won't
quite apply to yours, since we used different bases here; but easy
enough to correct yours from it.

Bisection was misleading: it appeared to be much easier to reproduce
with 22/22 taken off, and led to 16/22, but that's because that one
introduced a similar bug, which actually got fixed in 22/22:

relock_page_lruvec() and relock_page_lruvec_irq() in 16/22 onwards
are wrong, in each case the if block needs an
} else
lruvec = page_lruvec(page);

Ok, fixed in v3


You'll want to fix that in 16/22, but here's the patch for the end state:

Signed-off-by: Hugh Dickins<hughd@xxxxxxxxxx>
but forget that, just quietly fold the fixes into yours!

This actually reverts my "mm: optimize putback for 0-order reclaim",
so I removed this wrong optimization in v3

---
mm/vmscan.c | 20 ++++++--------------
1 file changed, 6 insertions(+), 14 deletions(-)

--- 3033K2.orig/mm/vmscan.c 2012-02-21 00:02:13.000000000 -0800
+++ 3033K2/mm/vmscan.c 2012-02-21 21:23:25.768381375 -0800
@@ -1342,7 +1342,6 @@ static int too_many_isolated(struct zone
*/
static noinline_for_stack struct lruvec *
putback_inactive_pages(struct lruvec *lruvec,
- struct scan_control *sc,
struct list_head *page_list)
{
struct zone_reclaim_stat *reclaim_stat =&lruvec->reclaim_stat;
@@ -1364,11 +1363,8 @@ putback_inactive_pages(struct lruvec *lr
continue;
}

- /* can differ only on lumpy reclaim */
- if (sc->order) {
- lruvec = __relock_page_lruvec(lruvec, page);
- reclaim_stat =&lruvec->reclaim_stat;
- }
+ lruvec = __relock_page_lruvec(lruvec, page);
+ reclaim_stat =&lruvec->reclaim_stat;

SetPageLRU(page);
lru = page_lru(page);
@@ -1566,7 +1562,7 @@ shrink_inactive_list(unsigned long nr_to
__count_vm_events(KSWAPD_STEAL, nr_reclaimed);
__count_zone_vm_events(PGSTEAL, zone, nr_reclaimed);

- lruvec = putback_inactive_pages(lruvec, sc,&page_list);
+ lruvec = putback_inactive_pages(lruvec,&page_list);

__mod_zone_page_state(zone, NR_ISOLATED_ANON, -nr_anon);
__mod_zone_page_state(zone, NR_ISOLATED_FILE, -nr_file);
@@ -1631,7 +1627,6 @@ shrink_inactive_list(unsigned long nr_to

static struct lruvec *
move_active_pages_to_lru(struct lruvec *lruvec,
- struct scan_control *sc,
struct list_head *list,
struct list_head *pages_to_free,
enum lru_list lru)
@@ -1643,10 +1638,7 @@ move_active_pages_to_lru(struct lruvec *
int numpages;

page = lru_to_page(list);
-
- /* can differ only on lumpy reclaim */
- if (sc->order)
- lruvec = __relock_page_lruvec(lruvec, page);
+ lruvec = __relock_page_lruvec(lruvec, page);

VM_BUG_ON(PageLRU(page));
SetPageLRU(page);
@@ -1770,9 +1762,9 @@ static void shrink_active_list(unsigned
*/
reclaim_stat->recent_rotated[file] += nr_rotated;

- lruvec = move_active_pages_to_lru(lruvec, sc,&l_active,&l_hold,
+ lruvec = move_active_pages_to_lru(lruvec,&l_active,&l_hold,
LRU_ACTIVE + file * LRU_FILE);
- lruvec = move_active_pages_to_lru(lruvec, sc,&l_inactive,&l_hold,
+ lruvec = move_active_pages_to_lru(lruvec,&l_inactive,&l_hold,
LRU_BASE + file * LRU_FILE);
__mod_zone_page_state(zone, NR_ISOLATED_ANON + file, -nr_taken);
unlock_lruvec_irq(lruvec);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/