Re: oom() _still_ killing init

Andrea Arcangeli (andrea@suse.de)
Thu, 17 Jun 1999 22:23:07 +0200 (CEST)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Alan Cox: "Re: scheduling latencies, [patch] `cp /dev/zero /tmp' (patch against 2.2.9)"
Previous message: Benno Senoner: "Re: scheduling latencies, [patch] `cp /dev/zero /tmp' (patch against 2.2.9)"
Next in thread: Alan Cox: "Re: scheduling latencies, [patch] `cp /dev/zero /tmp' (patch against 2.2.9)"
Reply: Alan Cox: "Re: scheduling latencies, [patch] `cp /dev/zero /tmp' (patch against 2.2.9)"

On Thu, 17 Jun 1999, Rik van Riel wrote:

>On Wed, 16 Jun 1999, Andrea Arcangeli wrote:
>> On Wed, 16 Jun 1999, Oliver Xymoron wrote:
>>
>> >I upgraded from 2.2.5 to 2.2.10-pre-no-way-of-telling-anymore and an out
>> >of memory condition[1] killed init again. It would be nice if this weren't
>>
>> void oom(struct task_struct * task)
>> {
>> + if (task->pid == 1)
>> + goto out;
>> printk("\nOut of memory for %s.\n", task->comm);
>> force_sig(SIGKILL, task);
>> + out:
>> }
>
>You're not serious about this, are you?

People gets init killed with 2.2.x (not at all with my VM patches). And
infact if you don't do the above you can easily kill init due OOM.

>> (it's against 2.2.10_andrea-VM5)
>
>If such a horrible kludge is needed with your VM patches,

My VM patches has nothing to do with the above patch. But incidentally the
above patch will work _only_ with my vm patches because 2.2.x won't call
oom() if the system goes OOM but it will silenty send a sigbus that may be
trapped from a malicious user btw. (so in 2.2.x you should do the check
for current->pid even in do_page_fault...)

>then something must be seriously wrong with them. Besides,
>even if a system really runs out of memory, you don't want
>the OS to handle it in such a random way that even init is
>in danger...

To make init not in danger you only have to add such simple check
infact. That's what you have to do also in any kind of heuristic.

My guess about the init problem is that if you play with `login` or if an
init-child gets killed while the machine is OOM you may cause init to
alloc memory.

Now I think that killing the faulting task is a too much high risk.
Killing the tasks that belongs to the higher mm in the system (exluding
init) is at least a bit better and may avoid damages in a "normal" system.

Rik could you send me your latest version of the OOM-killer patch? I would
like to have a look at it (I still have an old version and I don't know if
you released any newer version). I may integrate your sure better
heuristic in the oom() call (I could replace my silly OOM-killer that I
written yesterday from scratch), do you think it would be a good idea? My
oom() now looks like this (so you don't need to download VM6 only to look
at this bit of code):

void oom(struct task_struct * task)
{
struct task_struct * tsk;
struct mm_struct * mm = NULL;
unsigned long max = 0;

read_lock(&tasklist_lock);
for_each_task(tsk)
{
if (tsk->pid == 1)
continue;
if (tsk->mm->total_vm > max)
{
mm = tsk->mm;
max = mm->total_vm;
}
}
if (mm)
{
for_each_task(tsk)
{
if (tsk->mm == mm)
{
printk("\nOut of memory for %s.\n", tsk->comm);
force_sig(SIGKILL, tsk);
}
}
}
read_unlock(&tasklist_lock);
}

It's sure enough here but I know that there may be cases were we may kill
the wrong proggy. Anyway I think it's better the above _simple_ heuristic
than to kill the "random" faulting task.

Andrea Arcangeli

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: Alan Cox: "Re: scheduling latencies, [patch] `cp /dev/zero /tmp' (patch against 2.2.9)"
Previous message: Benno Senoner: "Re: scheduling latencies, [patch] `cp /dev/zero /tmp' (patch against 2.2.9)"
Next in thread: Alan Cox: "Re: scheduling latencies, [patch] `cp /dev/zero /tmp' (patch against 2.2.9)"
Reply: Alan Cox: "Re: scheduling latencies, [patch] `cp /dev/zero /tmp' (patch against 2.2.9)"