Re: can't oom-kill zap the victim's memory?

From: Michal Hocko
Date: Mon Sep 21 2015 - 12:12:13 EST


On Mon 21-09-15 17:32:52, Oleg Nesterov wrote:
> On 09/21, Michal Hocko wrote:
> >
> > On Mon 21-09-15 15:44:14, Oleg Nesterov wrote:
> > [...]
> > > So yes, in general oom_kill_process() can't call oom_unmap_func() directly.
> > > That is why the patch uses queue_work(oom_unmap_func). The workqueue thread
> > > takes mmap_sem and frees the memory allocated by user space.
> >
> > OK, this might have been a bit confusing. I didn't mean you cannot use
> > mmap_sem directly from the workqueue context. You _can_ AFAICS. But I've
> > mentioned that you _shouldn't_ use workqueue context in the first place
> > because all the workers might be blocked on locks and new workers cannot
> > be created due to memory pressure.
>
> Yes, yes, and I already tried to comment this part.

OK then we are on the same page, good.

> We probably need a
> dedicated kernel thread, but I still think (although I am not sure) that
> initial change can use workueue. In the likely case system_unbound_wq pool
> should have an idle thread, if not - OK, this change won't help in this
> case. This is minor.

The point is that the implementation should be robust from the very
beginning. I am not sure what you mean by the idle thread here but the
rescuer can get stuck the very same way other workers. So I think that
we cannot rely on WQ for a real solution here.

> > So I think we probably need to do this in the OOM killer context (with
> > try_lock)
>
> Yes we should try to do this in the OOM killer context, and in this case
> (of course) we need trylock. Let me quote my previous email:
>
> And we want to avoid using workqueues when the caller can do this
> directly. And in this case we certainly need trylock. But this needs
> some refactoring: we do not want to do this under oom_lock,

Why do you think oom_lock would be a big deal? Address space of the
victim might be really large but we can back off after a batch of
unmapped pages.

> otoh it
> makes sense to do this from mark_oom_victim() if current && killed,
> and a lot more details.
>
> and probably this is another reason why do we need MMF_MEMDIE. But again,
> I think the initial change should be simple.

I definitely agree with the simplicity for the first iteration. That
means only unmap private exclusive pages and release at most few megs of
them. I am still not sure about some details, e.g. futex sitting in such
a memory. Wouldn't threads blow up when they see an unmapped futex page,
try to page it in and it would be in an uninitialized state? Maybe this
is safe because they will die anyway but I am not familiar with that
code.
--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/