Re: memcg uncharge page counter mismatch

From: Minchan Kim
Date: Fri Dec 04 2015 - 08:35:54 EST


On Fri, Dec 04, 2015 at 10:58:15AM +0100, Michal Hocko wrote:
> On Fri 04-12-15 18:16:34, Minchan Kim wrote:
> > On Fri, Dec 04, 2015 at 09:52:27AM +0100, Michal Hocko wrote:
> > > On Fri 04-12-15 14:35:15, Minchan Kim wrote:
> > > > On Thu, Dec 03, 2015 at 04:47:29PM +0100, Michal Hocko wrote:
> > > > > On Thu 03-12-15 15:58:50, Michal Hocko wrote:
> > > > > [....]
> > > > > > Warning, this looks ugly as hell.
> > > > >
> > > > > I was thinking about it some more and it seems that we should rather not
> > > > > bother with partial thp at all and keep it in the original memcg
> > > > > instead. It is way much less code and I do not think this will be too
> > > > > disruptive. Somebody should be holding the thp head, right?
> > > > >
> > > > > Minchan, does this fix the issue you are seeing.
> > > >
> > > > This patch solves the issue but not sure it's right approach.
> > > > I think it could make regression that in old, we could charge
> > > > a THP page but we can't now.
> > >
> > > The page would still get charged when allocated. It just wouldn't get
> > > moved when mapped only partially. IIUC there will be still somebody
> > > mapping the THP head via pmd, right? That process will move the page to
> >
> > If I read code correctly, No. The split_huge_pmd splits just pmd,
> > not page itself. IOW, it could be possible !pmd_trans_huge(pmd) &&
> > PageTransHuge although there is only process owns the page.
>
> I am not sure I follow you. I thought there would still be other pmd
> which will hold the THP. Why should we keep the page as huge when all
> processes which map it have already split it up?

I didn't follow Kirill's work but just read part of code to implement
MADV_FREE so I just guess.
(high-order-alloc-and-compaction/split/collapse) are costly operations
so new work tried to avoid split page as far as possible.
For example, if it works with splitting pmd, not THP page,
it doesn't split the THP page where in mprotect path.
Even, it could do delay split-page via deferred _split_huge_page
even if THP page is freed.

>
> On the other hand it is true that the last process which maps the whole
> thp might have exited and leave others to map it partially.
>
> > > the new memcg when moved. Or is it possible that we will end up only
> > > with pte mapped THP from all processes? Kirill?
> >
> > I'm not Kirill but I think it's possible.
> > If so, a thing we can use is page_mapcount(page) == 1. With that,
> > it could gaurantee only a process owns the page so charge 512 instead of 1?
>
> Alright the exclusive holder should indeed move it. I will think how to
> simplify the previous patch (has it helped in your testing btw.?).

At least, your patch doesn't make the WARNING but I didn't check
the accouting was right.

Thanks.

>
> --
> Michal Hocko
> SUSE Labs

--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/