Re: [PATCH v2 2/3] mm: memcontrol: clean up and document effective low/min calculations

From: Michal Koutný
Date: Fri Feb 21 2020 - 12:10:35 EST


On Thu, Dec 19, 2019 at 03:07:17PM -0500, Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
> The effective protection of any given cgroup is a somewhat complicated
> construct that depends on the ancestor's configuration, siblings'
> configurations, as well as current memory utilization in all these
> groups.
I agree with that. It makes it a bit hard to determine the equilibrium
in advance.


> + * Consider the following example tree:
> *
> + * A A/memory.low = 2G, A/memory.current = 6G
> + * //\\
> + * BC DE B/memory.low = 3G B/memory.current = 2G
> + * C/memory.low = 1G C/memory.current = 2G
> + * D/memory.low = 0 D/memory.current = 2G
> + * E/memory.low = 10G E/memory.current = 0
> *
> + * and memory pressure is applied, the following memory
> + * distribution is expected (approximately*):
> *
> + * A/memory.current = 2G
> + * B/memory.current = 1.3G
> + * C/memory.current = 0.6G
> + * D/memory.current = 0
> + * E/memory.current = 0
> *
> + * *assuming equal allocation rate and reclaimability
I think the assumptions for this example don't hold (anymore).
Because reclaim rate depends on the usage above protection, the siblings
won't be reclaimed equally and so the low_usage proportionality will
change over time and the equilibrium distribution is IMO different (I'm
attaching an Octave script to calculate it).

As it depends on the initial usage, I don't think there can be given
such a general example (for overcommit).


> @@ -6272,12 +6262,63 @@ struct cgroup_subsys memory_cgrp_subsys = {
> * for next usage. This part is intentionally racy, but it's ok,
> * as memory.low is a best-effort mechanism.
Although it's a different issue but since this updates the docs I'm
mentioning it -- we treat memory.min the same, i.e. it's subject to the
same race, however, it's not meant to be best effort. I didn't look into
outcomes of potential misaccounting but the comment seems to miss impact
on memory.min protection.

> @@ -6292,52 +6333,29 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
> [...]
> + if (parent == root) {
> + memcg->memory.emin = memcg->memory.min;
> + memcg->memory.elow = memcg->memory.low;
> + goto out;
> }
Shouldn't this condition be 'if (parent == root_mem_cgroup)'? (I.e. 1st
level takes direct input, but 2nd and further levels redistribute only
what they really got from parent.)


Michal

% run as: octave-cli script
%
% Input configurations
% -------------------
% E parent effective protection
% n nominal protection of siblings set at the givel level
% c current consumption -,,-

% example from effective_protection 3.
E = 2;
n = [3 1 0 10];
c = [2 2 2 0]; % this converges to [1.16 0.84 0 0]
% c = [6 2 2 0]; % keeps ratio [1.5 0.5 0 0]
% c = [5 2 2 0]; % mixed ratio [1.45 0.55 0 0]
% c = [8 2 2 0]; % mixed ratio [1.53 0.47 0 0]

% example from effective_protection 5.
%E = 2;
%n = [1 0];
%c = [2 1]; % coming from close to equilibrium -> [1.50 0.50]
%c = [100 100]; % coming from "infinity" -> [1.50 0.50]
%c = [2 2]; % coming from uniformity -> [1.33 0.67]

% example of recursion by default
%E = 2;
%n = [0 0];
%c = [2 1]; % coming from disbalance -> [1.33 0.67]
%c = [100 100]; % coming from "infinity" -> [1.00 1.00]
%c = [2 2]; % coming from uniformity -> [1.00 1.00]

% example by using infinities (_without_ recursive protection)
%E = 2;
%n = [1e7 1e7];
%c = [2 1]; % coming from disbalance -> [1.33 0.67]
%c = [100 100]; % coming from "infinity" -> [1.00 1.00]
%c = [2 2]; % coming from uniformity -> [1.00 1.00]

% Reclaim parameters
% ------------------

% Minimal reclaim amount (GB)
cluster = 4e-6;

% Reclaim coefficient (think as 0.5^sc->priority)
alpha = .1

% Simulation parameters
% ---------------------
epsilon = 1e-7;
timeout = 1000;

% Simulation loop
% ---------------------
% Simulation assumes siblings consumed the initial amount of memory (w/out
% reclaim) and then the reclaim starts, all memory is reclaimable, i.e. treated
% same. It simulates only non-low reclaim and assumes all memory.min = 0.

ch = [];
eh = [];
rh = [];

for t = 1:timeout
% low_usage
u = min(c, n);
siblings = sum(u);

% effective_protection()
protected = min(n, c); % start with nominal
e = protected * min(1, E / siblings); % normalize overcommit

% recursive protection
unclaimed = max(0, E - siblings);
parent_overuse = sum(c) - siblings;
if (unclaimed > 0 && parent_overuse > 0)
overuse = max(0, c - protected);
e += unclaimed * (overuse / parent_overuse);
endif

% get_scan_count()
r = alpha * c; % assume all memory is in a single LRU list

% 1bc63fb1272b ("mm, memcg: make scan aggression always exclude protection")
sz = max(e, c);
r .*= (1 - (e+epsilon) ./ (sz+epsilon));

% uncomment to debug prints
e, c, r

% nothing to reclaim, reached equilibrium
if max(r) < epsilon
break;
endif

% SWAP_CLUSTER_MAX
r = max(r, (r > epsilon) .* cluster);
c = max(c - r, 0);

ch = [ch ; c];
eh = [eh ; e];
rh = [rh ; r];
endfor

t
c, e
plot([ch, eh])
pause()