Re: [RFC PATCH] mm, oom_reaper: gather each vma to prevent leaking TLB entry

From: Michal Hocko
Date: Mon Nov 06 2017 - 05:40:16 EST


On Mon 06-11-17 17:59:54, Wangnan (F) wrote:
>
>
> On 2017/11/6 16:52, Michal Hocko wrote:
> > On Mon 06-11-17 15:04:40, Bob Liu wrote:
> > > On Mon, Nov 6, 2017 at 11:36 AM, Wang Nan <wangnan0@xxxxxxxxxx> wrote:
> > > > tlb_gather_mmu(&tlb, mm, 0, -1) means gathering all virtual memory space.
> > > > In this case, tlb->fullmm is true. Some archs like arm64 doesn't flush
> > > > TLB when tlb->fullmm is true:
> > > >
> > > > commit 5a7862e83000 ("arm64: tlbflush: avoid flushing when fullmm == 1").
> > > >
> > > CC'ed Will Deacon.
> > >
> > > > Which makes leaking of tlb entries. For example, when oom_reaper
> > > > selects a task and reaps its virtual memory space, another thread
> > > > in this task group may still running on another core and access
> > > > these already freed memory through tlb entries.
> > No threads should be running in userspace by the time the reaper gets to
> > unmap their address space. So the only potential case is they are
> > accessing the user memory from the kernel when we should fault and we
> > have MMF_UNSTABLE to cause a SIGBUS. So is the race you are describing
> > real?
> >
> > > > This patch gather each vma instead of gathering full vm space,
> > > > tlb->fullmm is not true. The behavior of oom reaper become similar
> > > > to munmapping before do_exit, which should be safe for all archs.
> > I do not have any objections to do per vma tlb flushing because it would
> > free gathered pages sooner but I am not sure I see any real problem
> > here. Have you seen any real issues or this is more of a review driven
> > fix?
>
> We saw the problem when we try to reuse oom reaper's code in
> another situation. In our situation, we allow reaping a task
> before all other tasks in its task group finish their exiting
> procedure.
>
> I'd like to know what ensures "No threads should be running in
> userspace by the time the reaper"?

All tasks are killed by the time. So they should be taken out to the
kernel.
--
Michal Hocko
SUSE Labs