Re: oomkillers gone wild.

From: Thomas Gleixner
Date: Fri Jun 08 2012 - 16:37:15 EST


On Fri, 8 Jun 2012, David Rientjes wrote:
> On Tue, 5 Jun 2012, Dave Jones wrote:
>
> > OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> > 142524 142420 99% 9.67K 47510 3 1520320K task_struct
> > 142560 142417 99% 1.75K 7920 18 253440K signal_cache
> > 142428 142302 99% 1.19K 5478 26 175296K task_xstate
> > 306064 289292 94% 0.36K 6956 44 111296K debug_objects_cache
> > 143488 143306 99% 0.50K 4484 32 71744K cred_jar
> > 142560 142421 99% 0.50K 4455 32 71280K task_delay_info
> > 150753 145021 96% 0.45K 4308 35 68928K kmalloc-128
> >
> > Why so many task_structs ? There's only 128 processes running, and most of them
> > are kernel threads.
> >
>
> Do you have CONFIG_OPROFILE enabled?
>
> > /sys/kernel/slab/task_struct/alloc_calls shows..
> >
> > 142421 copy_process.part.21+0xbb/0x1790 age=8/19929576/48173720 pid=0-16867 cpus=0-7
> >
> > I get the impression that the oom-killer hasn't cleaned up properly after killing some of
> > those forked processes.
> >
> > any thoughts ?
> >
>
> If we're leaking task_struct's, meaning that put_task_struct() isn't
> actually freeing them when the refcount goes to 0, then it's certainly not
> because of the oom killer which only sends a SIGKILL to the selected
> process.

I rather suspect, that this is a asymetry between get_ and
put_task_struct and refcount just doesn't go to zero.

Thanks,

tglx


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/