Re: kernel 3.0: BUG: soft lockup: find_get_pages+0x51/0x110

From: PaweÅ Sikora
Date: Fri Oct 21 2011 - 18:07:50 EST


On Friday 21 of October 2011 11:07:56 Nai Xia wrote:
> On Fri, Oct 21, 2011 at 4:07 PM, Pawel Sikora <pluto@xxxxxxxx> wrote:
> > On Friday 21 of October 2011 14:22:37 Nai Xia wrote:
> >
> >> And as a side note. Since I notice that Pawel's workload may include OOM,
> >
> > my last tests on patched (3.0.4 + migrate.c fix + vserver) kernel produce full cpu load
> > on dual 8-cores opterons like on this htop screenshot -> http://pluto.agmk.net/kernel/screen1.png
> > afaics all userspace applications usualy don't use more than half of physical memory
> > and so called "cache" on htop bar doesn't reach the 100%.
>
> OKïdid you logged any OOM killing if there was some memory usage burst?
> But, well my above OOM reasoning is a direct short cut to imagined
> root cause of "adjacent VMAs which
> should have been merged but in fact not merged" case.
> Maybe there are other cases that can lead to this or maybe it's
> totally another bug....

i don't see any OOM killing with my conservative settings
(vm.overcommit_memory=2, vm.overcommit_ratio=100).

> But still I think if my reasoning is good, similar bad things will
> happen again some time in the future,
> even if it was not your case here...
>
> >
> > the patched kernel with disabled CONFIG_TRANSPARENT_HUGEPAGE (new thing in 2.6.38)
> > died at night, so now i'm going to disable also CONFIG_COMPACTION/MIGRATION in next
> > steps and stress this machine again...
>
> OK, it's smart to narrow down the range first....

disabling hugepage/compacting didn't help but disabling hugepage/compacting/migration keeps
opterons stable for ~9h so far. userspace uses ~40GB (from 64) ram, caches reach 100% on htop bar,
average load ~16. i wonder if it survive weekend...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/