Re: [ANNOUNCE] Minneapolis Cluster Summit, July 29-30

From: Steven Dake
Date: Mon Jul 12 2004 - 13:24:09 EST


On Sun, 2004-07-11 at 21:23, Daniel Phillips wrote:
> On Monday 12 July 2004 00:08, Steven Dake wrote:
> > On Sun, 2004-07-11 at 12:44, Daniel Phillips wrote:
> > Oom conditions are another fact of life for poorly sized systems. If
> > a cluster is within an OOM condition, it should be removed from the
> > cluster (because it is in overload, under which unknown and generally
> > bad behaviors occur).
>
> You missed the point. The memory deadlock I pointed out occurs in
> _normal operation_. You have to find a way around it, or kernel
> cluster services win, plain and simple.
>

The bottom line is that we just don't know if any such deadlock occurs,
under normal operations. The remaining objections to in-kernel cluster
services give us alot of reason to test out a userland approach.

I propose after a distributed lock service is implemented in user space,
to add support for such a project into the gfs and remaining redhat
storage cluster services trees. This will give us real data on
performance and reliability that we can't get by guessing.

Thanks
-steve


> > current code. If at a later time the processor can reenter the
> > membership because it has freed up some memory, it will do so
> > correctly.
>
> Think about it. Do you want nodes spontaneously falling over from time
> to time, even though nothing is wrong with them? What does that do
> your 5 nines?
>
> > > > I would rather avoid non-mainline kernel dependencies at this
> > > > time as it makes adoption difficult until kernel patches are
> > > > merged into upstream code. Who wants to patch their kernel to
> > > > try out some APIs?
> > >
> > > Everybody working on clusters. It's a fact of life that you have
> > > to apply patches to run cluster filesystems right now. Production
> > > will be a different story, but (except for the stable GFS code on
> > > 2.4) nobody is close to that.
> >
> > Perhaps people skilled in running pre-alpha software would consider
> > patching a kernel to "give it a run". I have no doubts about that.
> >
> > I would posit a guess people interested in implementing production
> > clusters are not too interested about applying kernel patches (and
> > causing their kernel to become unsupported) to achieve clustering
> > support any time soon.
>
> We are _far_ from production, at least on 2.6. At this point, we are
> only interested in people who like to code, test, tinker, and be the
> first kid on the block with a shiny new storage cluster in their rec
> room. And by "we" I mean "you, me, and everybody else who hopes that
> Linux will kick butt in clusters, in the 2.8 time frame."
>
> > > > I am doubtful these sort of kernel patches will be merged without
> > > > a strong argument of why it absolutely must be implemented in the
> > > > kernel vs all of the counter arguments against a kernel
> > > > implementation.
> > >
> > > True. Do you agree that the PF_MEMALLOC argument is a strong one?
> >
> > out of memory overload is a sucky situation poorly handled by any
> > software, kernel, userland, embedded, whatever.
>
> In case you missed it above, please let me point out one more time that
> I am not talking about OOM. I'm talking about a deadlock that may come
> up even when a resource usage is well within limits, which is inherent
> in the basic design of Linux. There is nothing Byzantine about it.
>
> Regards,
>
> Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/