Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1sticks-and-crashs)

From: Greg Kroah-Hartman
Date: Mon Apr 09 2012 - 18:21:30 EST


On Mon, Apr 09, 2012 at 03:13:00PM -0700, Colin Cross wrote:
> On Mon, Apr 9, 2012 at 2:22 PM, David Rientjes <rientjes@xxxxxxxxxx> wrote:
> > On Mon, 9 Apr 2012, Linus Torvalds wrote:
> >
> >> The real bug is actually that those notifiers are a f*cking joke, and
> >> the return value from the notifier is a mistake.
> >>
> >> So I personally think that the real problem is this code in
> >> profile_handoff_task:
> >>
> >>         return (ret == NOTIFY_OK) ? 1 : 0;
> >>
> >> and ask yourself two questions:
> >>
> >>  - what the hell does NOTIFY_OK/NOTIFY_DONE mean?
> >>  - what happens if there are multiple notifiers that all (or some)
> >> return NOTIFY_OK?
> >>
> > NOTIFY_OK should never be a valid response for this notifier the way it's
> > currently implemented, it should be NOTIFY_STOP to stop iterating the call
> > chain to avoid a double free.  Right now it doesn't matter because only
> > oprofile is actually freeing the task_struct and lowmemorykiller should be
> > using NOTIFY_DONE.
> >
> > Then we have a completeness issue if multiple callbacks want to return
> > NOTIFY_STOP and an ordering issue if the oprofile callback is invoked
> > before lowmemorykiller.
> >
> >> I'll tell you what my answers are:
> >>
> >>  (a) NOTIFY_DONE is the "ok, everything is fine, you can free the
> >> task-struct". It's also what that handoff notifier thing returns if
> >> there are no notifiers registered at all.
> >>
> >>      So the fix to the Android lowmemorykiller is as simple as just
> >> changing NOTIFY_OK to NOTIFY_DONE, which will mean that the caller
> >> will properly free the task struct.
> >>
> >
> > I don't think so for Werner's config who also has CONFIG_OPROFILE=y, so
> > oprofile would return NOTIFY_OK and queue the task_struct for free, then
> > the second notifier callback to the lowmemorykiller would return
> > NOTIFY_DONE which would result in put_task_struct() doing free_task()
> > itself for a double free.
> >
> >>      The NOTIFY_OK/NOTIFY_DONE difference really does seem to be just
> >> "NOTIFY_OK means that I will free the task myself later". That's what
> >> the oprofile uses, and it frees the task.
> >>
> >>  (b) But the whole interface is a total f*cking mess. If *multiple*
> >> people return NOTIFY_OK, they're royally fucked. And the whole (and
> >> only) point of notifiers is that you can register multiple different
> >> ones independently.
> >>
> >> So quite frankly, the *real* bug is not in that android driver
> >> (although I'd say that we should just make it return NOTIFY_DONE and
> >> be done with it). The real bug is that the whole f*cking notifier is a
> >> mistake, and checking the error return was the biggest mistake of all.
> >>
> >
> > Right, we can't handoff the freeing of the task_struct to more than one
> > notifier.  It seems misdesigned from the beginning and what we really want
> > is to hijack task->usage for __put_task_struct(task) if we have such a
> > notifier callchain and require each one (currently just oprofile) to take
> > a reference on task->usage for NOTIFY_OK and then be responsible for
> > dropping the reference when it's done with it later instead of requiring
> > it to free the task_struct itself.
> >
> > That's _if_ we want to continue to have such an interface in the first
> > place where it's only really necessary right now for oprofile (and, hence,
> > wasn't implemented in an extendable way).  I'm thinking the
> > lowmemorykiller, as I eluded to, could be written in a way where we can
> > detect if a thread we've already killed has exited yet before killing
> > another one.  We can't just store a pointer to the task_struct of the
> > killed task since it could be reused for a fork later, but we could use
> > TIF_MEMDIE like the oom killer does.
>
> This was a known issue in 2010, in the android tree the use of
> task_handoff_register was dropped one day after it was added and
> replaced with a new task_free_register hook. I assume Greg dropped
> the fix during the android tree refresh in 3.0 because it depended on
> a change to kernel/fork.c. The two relevant patches are (using
> codeaurora's gitweb becase we don't have one right now):
>
> sched: Add a generic notifier when a task struct is about to be freed
> https://www.codeaurora.org/gitweb/quic/la/?p=kernel/common.git;a=commitdiff;h=667dffa787a87ef4ea43cc65957ce96077fdcd0a

Yes, I can't add a patch like that for this driver, that is why I
thought everyone was getting together to "properly" determine how to
solve this oom notifier problem. Has that work stalled somwhere?

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/