Re: [patch 1/7 -mm] oom: filter tasks not sharing the same cpuset

From: KOSAKI Motohiro
Date: Mon Feb 15 2010 - 23:52:26 EST


> On Mon, 15 Feb 2010, KOSAKI Motohiro wrote:
>
> > > Tasks that do not share the same set of allowed nodes with the task that
> > > triggered the oom should not be considered as candidates for oom kill.
> > >
> > > Tasks in other cpusets with a disjoint set of mems would be unfairly
> > > penalized otherwise because of oom conditions elsewhere; an extreme
> > > example could unfairly kill all other applications on the system if a
> > > single task in a user's cpuset sets itself to OOM_DISABLE and then uses
> > > more memory than allowed.
> > >
> > > Killing tasks outside of current's cpuset rarely would free memory for
> > > current anyway.
> > >
> > > Signed-off-by: David Rientjes <rientjes@xxxxxxxxxx>
> >
> > This patch does right thing and looks promissing. but unfortunately
> > I have to NAK this patch temporary.
> >
> > This patch is nearly just revert of the commit 7887a3da75. We have to
> > dig archaeology mail log and find why this reverting don't cause
> > the old pain again.
> >
>
> Nick is probably wondering why I cc'd him on this patchset, and this is it
> :)

Good decision :)

>
> We now determine whether an allocation is constrained by a cpuset by
> iterating through the zonelist and checking
> cpuset_zone_allowed_softwall(). This checks for the necessary cpuset
> restrictions that we need to validate (the GFP_ATOMIC exception is
> irrelevant, we don't call into the oom killer for those). We don't need
> to kill outside of its cpuset because we're not guaranteed to find any
> memory on those nodes, in fact it allows for needless oom killing if a
> task sets all of its threads to have OOM_DISABLE in its own cpuset and
> then runs out of memory. The oom killer would have killed every other
> user task on the system even though the offending application can't
> allocate there. That's certainly an undesired result and needs to be
> fixed in this manner.

But this explanation is irrelevant and meaningless. CPUSET can change
restricted node dynamically. So, the tsk->mempolicy at oom time doesn't
represent the place of task's usage memory. plus, OOM_DISABLE can
always makes undesirable result. it's not special in this case.

The fact is, both current and your heuristics have a corner case. it's
obvious. (I haven't seen corner caseless heuristics). then talking your
patch's merit doesn't help to merge the patch. The most important thing
is, we keep no regression. personally, I incline your one. but It doesn't
mean we can ignore its demerit.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/