Re: [PATCH 09/12] oom: remove PF_EXITING check completely

From: Oleg Nesterov
Date: Fri Jun 04 2010 - 06:06:27 EST


On 06/03, David Rientjes wrote:
>
> On Fri, 4 Jun 2010, Oleg Nesterov wrote:
>
> > > > > > Currently, PF_EXITING check is completely broken. because 1) It only
> > > > > > care main-thread and ignore sub-threads
> > > > >
> > > > > Then check the subthreads.
> > > > >
> > >
> > > Did you want to respond to this?
> >
> > Please explain what you mean. There were already a lot of discussions
> > about mt issues, I do not know what you have in mind.
>
> Can you check the subthreads to see if they are not PF_EXITING?

To detect the process with the dead group leader?

Yes, we can. We already discussed this. Probably it is better to check
PF_EXITING and signal_group_exit().

> > > I'm guessing at the relevancy here because the changelog is extremely
> > > poorly worded (if I were Andrew I would have no idea how important this
> > > patch is based on the description other than the alarmist words of "... is
> > > completely broken)", but if we're concerned about the coredumper not being
> > > able to find adequate resources to allocate memory from, we can give it
> > > access to reserves specifically,
> >
> > I don't think so. If oom-kill wants to kill the task which dumps the
> > code, it should stop the coredumping and exit.
>
> That's a coredump change, not an oom killer change.

Yes. do_coredump() should be fixed. This is not trivial (and needs the
subtle changes outside of fs/exec.c), we are looking for the simple fix
for now.

> If the coredumper
> needs memory and runs into the oom killer, this PF_EXITING check, which
> you want to remove, gives it access to memory reserves by setting
> TIF_MEMDIE so it can quickly finish and die. This allows it to exit
> without oom killing anything else because the tasklist scan in the oom
> killer is not preempted by finding a TIF_MEMDIE task.

David, sorry. I already tried to explain (at least twice) that TIF_MEMDIE
(or SIGKILL even if do_coredump() was interruptible) can not help unless
you find the right thread, this is not trivial even if we forget about
CLONE_VM tasks.

And personally I disagree that it should use memory reserves, but this
doesn't matter.


Let's stop this. You shouldn't convince me. I am not the author of this
patch, and I said many times that I do not pretend I understand oom-kill
needs. I jumped into this discussion because your initial objection
(fatal_signal_pending() should fix the problems) was technically wrong.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/