Re: [RFC][PATCH 00/11] track files for checkpointability

From: Dave Hansen
Date: Fri Mar 06 2009 - 11:46:22 EST


On Fri, 2009-03-06 at 10:23 -0600, Serge E. Hallyn wrote:
> Which imo is fine, but my question is whether that leaves any actual
> value in the persistent per-resource uncheckpointable flag.

OK, let's take a look back at this discussion a little bit and how we
got here.

Ingo quotes:
> Yeah, per resource it should be. That's per task in the normal
> case - except for threaded workloads where it's shared by
> threads.

> Uncheckpointable should be a one-way flag anyway. We want this
> to become usable, so uncheckpointable functionality should be as
> painful as possible, to make sure it's getting fixed ...

> Is there any automated test that could discover C/R breakage via
> brute force? All that matters in such cases is to get the "you
> broke stuff" information as soon as possible. If it comes at an
> early stage developers can generally just fix stuff.

You add these things together and you get what I posted. My patch is:
1. per resource
2. has a one way flag
3. Gives messages to developers at an early stage (dmesg) and lets them
explore it more thoroughly (/proc)

But, these "early stage" messages are completely opposed to an approach
that uses sys_checkpoint() in some form (like with a -1 fd as an
argument).

Think of it like lockdep. We *could* have designed lockdep to simply
give us a nice message whenever we do an a/b b/a deadlock. That would
be helpful. Or, we could design it to record all lock acquisitions that
didn't deadlock to see if they ever possibly deadlock. (We did the
second one, btw). That gave an early, useful, warning that developers
could fix before we encounter an actual problem. I'm advocating such a
mechanism for c/r.

-- Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/