Re: [PATCH] mm: fix unsafe page -> lruvec lookups with cgroup charge migration

From: Shakeel Butt
Date: Thu Nov 21 2019 - 16:31:08 EST


On Thu, Nov 21, 2019 at 12:56 PM Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
>
> On Wed, Nov 20, 2019 at 07:15:27PM -0800, Hugh Dickins wrote:
> > It like the way you've rearranged isolate_lru_page() there, but I
> > don't think it amounts to more than a cleanup. Very good thinking
> > about the odd "lruvec->pgdat = pgdat" case tucked away inside
> > mem_cgroup_page_lruvec(), but actually, what harm does it do, if
> > mem_cgroup_move_account() changes page->mem_cgroup concurrently?
> >
> > You say use-after-free, but we have spin_lock_irq here, and the
> > struct mem_cgroup (and its lruvecs) cannot be freed until an RCU
> > grace period expires, which we rely upon in many places, and which
> > cannot happen until after the spin_unlock_irq.
>
> You are correct, I missed the rcu locking implied by the
> spinlock. With this, the justification for this patch is wrong.
>
> But all of this is way too fragile and error-prone for my taste. We're
> looking up a page's lruvec in a scope that does not promise at all
> that the lruvec will be the page's. Luckily we currently don't touch
> the lruvec outside of the PageLRU branch, but this subtlety is
> entirely non-obvious from the code.
>
> I will put more thought into this. Let's scrap this patch for now.

What about the comment on mem_cgroup_page_lruvec()? I feel that
comment is a good documentation independent of the original patch.