Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1sticks-and-crashs)

From: David Rientjes
Date: Mon Apr 09 2012 - 17:22:37 EST


On Mon, 9 Apr 2012, Linus Torvalds wrote:

> The real bug is actually that those notifiers are a f*cking joke, and
> the return value from the notifier is a mistake.
>
> So I personally think that the real problem is this code in
> profile_handoff_task:
>
> return (ret == NOTIFY_OK) ? 1 : 0;
>
> and ask yourself two questions:
>
> - what the hell does NOTIFY_OK/NOTIFY_DONE mean?
> - what happens if there are multiple notifiers that all (or some)
> return NOTIFY_OK?
>

NOTIFY_OK should never be a valid response for this notifier the way it's
currently implemented, it should be NOTIFY_STOP to stop iterating the call
chain to avoid a double free. Right now it doesn't matter because only
oprofile is actually freeing the task_struct and lowmemorykiller should be
using NOTIFY_DONE.

Then we have a completeness issue if multiple callbacks want to return
NOTIFY_STOP and an ordering issue if the oprofile callback is invoked
before lowmemorykiller.

> I'll tell you what my answers are:
>
> (a) NOTIFY_DONE is the "ok, everything is fine, you can free the
> task-struct". It's also what that handoff notifier thing returns if
> there are no notifiers registered at all.
>
> So the fix to the Android lowmemorykiller is as simple as just
> changing NOTIFY_OK to NOTIFY_DONE, which will mean that the caller
> will properly free the task struct.
>

I don't think so for Werner's config who also has CONFIG_OPROFILE=y, so
oprofile would return NOTIFY_OK and queue the task_struct for free, then
the second notifier callback to the lowmemorykiller would return
NOTIFY_DONE which would result in put_task_struct() doing free_task()
itself for a double free.

> The NOTIFY_OK/NOTIFY_DONE difference really does seem to be just
> "NOTIFY_OK means that I will free the task myself later". That's what
> the oprofile uses, and it frees the task.
>
> (b) But the whole interface is a total f*cking mess. If *multiple*
> people return NOTIFY_OK, they're royally fucked. And the whole (and
> only) point of notifiers is that you can register multiple different
> ones independently.
>
> So quite frankly, the *real* bug is not in that android driver
> (although I'd say that we should just make it return NOTIFY_DONE and
> be done with it). The real bug is that the whole f*cking notifier is a
> mistake, and checking the error return was the biggest mistake of all.
>

Right, we can't handoff the freeing of the task_struct to more than one
notifier. It seems misdesigned from the beginning and what we really want
is to hijack task->usage for __put_task_struct(task) if we have such a
notifier callchain and require each one (currently just oprofile) to take
a reference on task->usage for NOTIFY_OK and then be responsible for
dropping the reference when it's done with it later instead of requiring
it to free the task_struct itself.

That's _if_ we want to continue to have such an interface in the first
place where it's only really necessary right now for oprofile (and, hence,
wasn't implemented in an extendable way). I'm thinking the
lowmemorykiller, as I eluded to, could be written in a way where we can
detect if a thread we've already killed has exited yet before killing
another one. We can't just store a pointer to the task_struct of the
killed task since it could be reused for a fork later, but we could use
TIF_MEMDIE like the oom killer does.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/