Re: [PATCH 3/4] OOM, PM: OOM killed task shouldn't escape PM suspend

From: Michal Hocko
Date: Thu Nov 06 2014 - 11:02:13 EST


On Thu 06-11-14 10:09:27, Tejun Heo wrote:
> On Thu, Nov 06, 2014 at 02:05:43PM +0100, Michal Hocko wrote:
> > But this is nothing new. Suspend hasn't been checking for fatal signals
> > nor for TIF_MEMDIE since the OOM disabling was introduced and I suppose
> > even before.
> >
> > This is not harmful though. The previous OOM kill attempt would kick the
> > current TASK and mark it with TIF_MEMDIE and retry the allocation. After
> > OOM is disabled the allocation simply fails. The current will die on its
> > way out of the kernel. Definitely worth fixing. In a separate patch.
>
> Hah? Isn't this a new outright A-B B-A deadlock involving the rwsem
> you added?

No, see below.

> > > disable() call must be able to fail.
> >
> > This would be a way to do it without requiring caller to check for
> > TIF_MEMDIE explicitly. The fewer of them we have the better.
>
> Why the hell would the caller ever even KNOW about this? This is
> something which must be encapsulated in the OOM killer disable/enable
> interface.
>
> > +bool oom_killer_disable(void)
> > {
> > + bool ret = true;
> > +
> > down_write(&oom_sem);
>
> How would this task pass the above down_write() if the OOM killer is
> already read locking oom_sem? Or is the OOM killer guaranteed to make
> forward progress even if the killed task can't make forward progress?
> But, if so, what are we talking about in this thread?

Yes, OOM killer simply kicks the process sets TIF_MEMDIE and terminates.
That will release the read_lock, allow this to take the write lock and
check whether it the current has been killed without any races.
OOM killer doesn't wait for the killed task. The allocation is retried.

Does this explain your concern?

[...]
--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/