[PATCH] vmscan: memcg: check whether the low limit should be ignored

From: Michal Hocko
Date: Mon May 05 2014 - 09:12:18 EST


Low-limit (aka guarantee) is ignored when there is no group scanned
during the first round of __shink_zone. This approach doesn't work when
multiple reclaimers race and reclaim the same hierarchy (e.g. kswapd
vs. direct reclaim or multiple tasks hitting the hard limit) because
memcg iterator makes sure that multiple reclaimers are interleaved
in the hierarchy. This means that some reclaimers can see 0 scanned
groups although there are groups which are above the low-limit and they
were reclaimed on behalf of other reclaimers. This leads to a premature
low-limit break.

This patch adds mem_cgroup_all_within_guarantee() which will check
whether all the groups in the reclaimed hierarchy are within their low
limit and shrink_zone will allow the fallback reclaim only when that is
true. This alone is still not sufficient however because it would lead
to another problem. If a reclaimer constantly fails to scan anything
because it sees only groups within their guarantees while others do the
reclaim then the reclaim priority would drop down very quickly.
shrink_zone has to be careful to preserve scan at least one group
semantic so __shrink_zone has to be retried until at least one group
is scanned.

Signed-off-by: Michal Hocko <mhocko@xxxxxxx>
---
include/linux/memcontrol.h | 5 +++++
mm/memcontrol.c | 13 +++++++++++++
mm/vmscan.c | 17 ++++++++++++-----
3 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index c00ccc5f70b9..077a777bd9ff 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -94,6 +94,7 @@ bool task_in_mem_cgroup(struct task_struct *task,

extern bool mem_cgroup_within_guarantee(struct mem_cgroup *memcg,
struct mem_cgroup *root);
+extern bool mem_cgroup_all_within_guarantee(struct mem_cgroup *root);

extern struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page);
extern struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p);
@@ -296,6 +297,10 @@ static inline bool mem_cgroup_within_guarantee(struct mem_cgroup *memcg,
{
return false;
}
+static inline bool mem_cgroup_all_within_guarantee(struct mem_cgroup *root)
+{
+ return false;
+}

static inline struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page)
{
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 58982d18f6ea..4fd4784d1548 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2833,6 +2833,19 @@ bool mem_cgroup_within_guarantee(struct mem_cgroup *memcg,
return false;
}

+bool mem_cgroup_all_within_guarantee(struct mem_cgroup *root)
+{
+ struct mem_cgroup *iter;
+
+ for_each_mem_cgroup_tree(iter, root)
+ if (!mem_cgroup_within_guarantee(iter, root)) {
+ mem_cgroup_iter_break(root, iter);
+ return false;
+ }
+
+ return true;
+}
+
struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page)
{
struct mem_cgroup *memcg = NULL;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5f923999bb79..2686e47f04cc 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2293,13 +2293,20 @@ static unsigned __shrink_zone(struct zone *zone, struct scan_control *sc,

static void shrink_zone(struct zone *zone, struct scan_control *sc)
{
- if (!__shrink_zone(zone, sc, true)) {
+ bool honor_guarantee = true;
+
+ while (!__shrink_zone(zone, sc, honor_guarantee)) {
/*
- * First round of reclaim didn't find anything to reclaim
- * because of the memory guantees for all memcgs in the
- * reclaim target so try again and ignore guarantees this time.
+ * The previous round of reclaim didn't find anything to scan
+ * because
+ * a) the whole reclaimed hierarchy is within guarantee so
+ * we fallback to ignore the guarantee because other option
+ * would be the OOM
+ * b) multiple reclaimers are racing and so the first round
+ * should be retried
*/
- __shrink_zone(zone, sc, false);
+ if (mem_cgroup_all_within_guarantee(sc->target_mem_cgroup))
+ honor_guarantee = false;
}
}

--
2.0.0.rc0

--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/