Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1sticks-and-crashs)

From: David Rientjes
Date: Mon Apr 09 2012 - 06:15:14 EST


On Sun, 8 Apr 2012, Linus Torvalds wrote:

> Or does anybody see anything that keeps thread counts raised so that
> "free_task()" doesn't get done. kernel/profoe.c does that
> "profile_handoff_task()" thing - but only oprofile and the android
> low-memory-killer logic seems to use it though. But that's exactly the
> kind of thing that Werner's "configure everything" might enable -
> Werner?
>

I think you nailed it.

I suspect the problem is 1eda5166c764 ("staging: android/lowmemorykiller:
Don't unregister notifier from atomic context") merged during the 3.4
merge window and, unfortunately, backported to stable.

Werner's config has CONFIG_ANDROID_LOW_MEMORY_KILLER=y so we never
actually unregister the callback for the task handoff as a result of the
patch. It's supposed to take responsibility for doing free_task() itself
when it's good and ready, usually by putting it into a list to free, but
now we're just doing this:

struct task_struct *task = data;

if (task == lowmem_deathpending)
lowmem_deathpending = NULL;

return NOTIFY_OK;

whenever put_task_struct() decrements the refcount to 0 and thus they get
leaked and bad things happen.

This is confirmed by Werner's oom log that shows extremely small values
for the oom score of the task chosen to oom kill. His first log showed X
being killed with a score of 29. That means it is the most memory-hogging
task on the system and is only using 2.9% of total system memory.

I can't actually see how the lowmemorykiller actually ever freed any
task_struct after unregistering the notifier during the callback. It
seems like this has always leaked memory but it used to happen much more
slowly because, prior to the patch, we did task_handoff_unregister() in
the callback. So I think the code was always wrong but now it's out of
control because the notifier remains enabled indefinitely. I can't say
the 1eda5166c764 ("staging: android/lowmemorykiller: Don't unregister
notifier from atomic context") commit is fully to blame, it just made the
error much more egregious.

As it sits in 3.4-rc2, this whole lowmem_deathpending business seems to be
storing a pointer to the task_struct of something sent a SIGKILL and it
remains that way until the lowmem_deathpending_timeout expires and
something else is killed instead. lowmem_deathpending gets cleared on the
task handoff if the task selected for kill just exited. This ensures we
only kill one thread at a time.

That's all fine and good but it seems like we're never freeing the
task_struct itself on exit. This seems like the most obvious fix but it
would be really nice to revisit this and remove the dependency on
CONFIG_PROFILING and just check if the lowmem_deathpending thread is found
in the iteration for lowmem_shrink() prior to killing.


android, lowmemorykiller: free task struct on profiling handoff

The lowmemorykiller stores a pointer to a killed thread's task_struct in
lowmem_deathpending when profiling is enabled. When put_task_struct()
results in the refcount going to 0, the task_notify_func() callback clears
lowmem_deathpending if it is the thread that was killed last. This
prevents additional killing until lowmem_deathpending_timeout elapses.

The responsibility of every task handoff notifier is to free the tasks
handed off to it, however, and this was being neglected, which results
in a massive memory leak since no task_struct ever gets freed.

Fix that by freeing the task_struct since we no longer need a reference to
it.

Reported-by: werner <w.landgraf@xxxxx>
Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: David Rientjes <rientjes@xxxxxxxxxx>
---
drivers/staging/android/lowmemorykiller.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c
--- a/drivers/staging/android/lowmemorykiller.c
+++ b/drivers/staging/android/lowmemorykiller.c
@@ -78,6 +78,7 @@ task_notify_func(struct notifier_block *self, unsigned long val, void *data)

if (task == lowmem_deathpending)
lowmem_deathpending = NULL;
+ free_task(task);

return NOTIFY_OK;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/