Re: [oom]: [0/4] fix OOM deadlock running OAST

From: Marcelo Tosatti
Date: Thu Jun 24 2004 - 09:53:59 EST


On Wed, Jun 23, 2004 at 05:26:51PM -0700, William Lee Irwin III wrote:
> William Lee Irwin III <wli@xxxxxxxxxxxxxx> wrote:
> >> It's a
> >> judgment call as to whether it's beneficial in general, as it does
> >> insulate userspace somewhat from needing to wait for slow IO being the
> >> ostensible cause of the allocation failure.
>
> On Wed, Jun 23, 2004 at 05:18:18PM -0700, Andrew Morton wrote:
> > mm... I can only see that happening if the IO system is retiring write
> > requests at much less than 10/sec, which seems unlikely. Still, that can
> > be tuned around.
>
> Then it sounds like the smaller fix below may be better for you.
>
>
> William Lee Irwin III <wli@xxxxxxxxxxxxxx> wrote:
> >> RedHat vendor kernels have removed the check entirely
>
> On Wed, Jun 23, 2004 at 05:18:18PM -0700, Andrew Morton wrote:
> > When telling us this sort of thing, please always specify the kernel version.
> > I assume you're referring to a 2.6 kernel? If so, some thwapping might be
> > in order.
>
> No, RHEL3. I'm not aware of any mm/oom_kill.c changes in any of the
> Fedora snapshots.
>
>
> -- wli
>
> During stress testing at Oracle to determine the maximum number of
> clients 2.6 can service, it was discovered that the failure mode of
> excessive numbers of clients was kernel deadlock. The following patch
> removes the check if (nr_swap_pages > 0) from out_of_memory() as this
> heuristic fails to detect memory exhaustion due to pinned allocations,
> directly causing the aforementioned deadlock.
>
>
> ===== mm/oom_kill.c 1.26 vs edited =====
> --- 1.26/mm/oom_kill.c Thu Jun 3 01:46:39 2004
> +++ edited/mm/oom_kill.c Wed Jun 23 17:22:22 2004
> @@ -230,12 +230,6 @@
> static unsigned long first, last, count, lastkill;
> unsigned long now, since;
>
> - /*
> - * Enough swap space left? Not OOM.
> - */
> - if (nr_swap_pages > 0)
> - return;
> -
> spin_lock(&oom_lock);
> now = jiffies;
> since = now - last;

Removing the check on v2.4 based kernels will trigger the OOM killer
too soon for a lot of cases, I'm pretty sure.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/