Re: [PATCH] Fix the issue that lowmemkiller fell into a cycle that try to kill a task

From: 朱辉
Date: Tue Oct 14 2014 - 03:01:33 EST


2014 09 24 23:36, Rik van Riel:
> On 09/22/2014 10:57 PM, Hui Zhu wrote:
>> The cause of this issue is when free memroy size is low and a lot of task is
>> trying to shrink the memory, the task that is killed by lowmemkiller cannot get
>> CPU to exit itself.
>>
>> Fix this issue with change the scheduling policy to SCHED_FIFO if a task's flag
>> is TIF_MEMDIE in lowmemkiller.
>
> Is it actually true that the task that was killed by lowmemkiller
> cannot get CPU time?

I am so sorry that answer this mail late because I tried to do more test
around it.
But this issue is really hard to reproduce the issue. I got a special
app that can reproduce this issue easyly. But I still need retry a lot
of times to repdroduce this issue.

And I found that most of time, the task cannot be killed because it is
blocked by binder_lock.
It looks like there are something wrong with a task that get binder_lock
and it is blocked by another thing.

So I make a patch that change a binder_lock to binder_lock_killable to
handle this issue.(I will post it later)
It work sometime but I am not sure it is right.
And I just met one time, the kernel with the binder patch and without
the lowmemkiller SCHED_FIFO patch, a task that didn't blocked by a lock.
And different tasks call lowmemkiller tried to kill this task.
I think the root cause of this issue is killed task cannot get cpu.
But I just got this issue one time.

>
> It is also possible that the task is busy in the kernel, for example
> in the reclaim code, and is not breaking out of some loop fast enough,
> despite the TIF_MEMDIE flag being set.
>
> I suspect SCHED_FIFO simply papers over that kind of issue, by not
> letting anything else run until the task is gone, instead of fixing
> the root cause of the problem.
>
>

According to I introduction, I think lowmemkiller SCHED_FIFO patch maybe
can handle some issue.

Thanks,
Hui