Re: [RFC] Making memcg track ownership per address_space or anon_vma

From: Greg Thelen
Date: Thu Feb 05 2015 - 19:03:45 EST



On Thu, Feb 05 2015, Tejun Heo wrote:

> Hey,
>
> On Thu, Feb 05, 2015 at 02:05:19PM -0800, Greg Thelen wrote:
>> > A
>> > +-B (usage=2M lim=3M min=2M hosted_usage=2M)
>> > +-C (usage=0 lim=2M min=1M shared_usage=2M)
>> > +-D (usage=0 lim=2M min=1M shared_usage=2M)
>> > \-E (usage=0 lim=2M min=0)
> ...
>> Maybe, but I want to understand more about how pressure works in the
>> child. As C (or D) allocates non shared memory does it perform reclaim
>> to ensure that its (C.usage + C.shared_usage < C.lim). Given C's
>
> Yes.
>
>> shared_usage is linked into B.LRU it wouldn't be naturally reclaimable
>> by C. Are you thinking that charge failures on cgroups with non zero
>> shared_usage would, as needed, induce reclaim of parent's hosted_usage?
>
> Hmmm.... I'm not really sure but why not? If we properly account for
> the low protection when pushing inodes to the parent, I don't think
> it'd break anything. IOW, allow the amount beyond the sum of low
> limits to be reclaimed when one of the sharers is under pressure.
>
> Thanks.

I'm not saying that it'd break anything. I think it's required that
children perform reclaim on shared data hosted in the parent. The child
is limited by shared_usage, so it needs ability to reclaim it. So I
think we're in agreement. Child will reclaim parent's hosted_usage when
the child is charged for shared_usage. Ideally the only parental memory
reclaimed in this situation would be shared. But I think (though I
can't claim to have followed the new memcg philosophy discussions) that
internal nodes in the cgroup tree (i.e. parents) do not have any
resources charged directly to them. All resources are charged to leaf
cgroups which linger until resources are uncharged. Thus the LRUs of
parent will only contain hosted (shared) memory. This thankfully focus
parental reclaim easy on shared pages. Child pressure will,
unfortunately, reclaim shared pages used by any container. But if
shared pages were charged all sharing containers, then it will help
relieve pressure in the caller.

So this is a system which charges all cgroups using a shared inode
(recharge on read) for all resident pages of that shared inode. There's
only one copy of the page in memory on just one LRU, but the page may be
charged to multiple container's (shared_)usage.

Perhaps I missed it, but what happens when a child's limit is
insufficient to accept all pages shared by its siblings? Example
starting with 2M cached of a shared file:

A
+-B (usage=2M lim=3M hosted_usage=2M)
+-C (usage=0 lim=2M shared_usage=2M)
+-D (usage=0 lim=2M shared_usage=2M)
\-E (usage=0 lim=1M shared_usage=0)

If E faults in a new 4K page within the shared file, then E is a sharing
participant so it'd be charged the 2M+4K, which pushes E over it's
limit.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/