Re: [RFC v6][PATCH 0/9] Kernel based checkpoint/restart

From: Daniel Lezcano
Date: Thu Oct 16 2008 - 08:36:43 EST


Oren Laadan wrote:
Cedric Le Goater wrote:
Dave Hansen wrote:
On Mon, 2008-10-13 at 10:13 +0200, Cedric Le Goater wrote:
hmm, that's rather complex, because we have to take into account the kernel stack, no ? This is what Andrey was trying to solve in his patchset back in September :

http://lkml.org/lkml/2008/9/3/96

the restart phase simulates a clone and switch_to to (not) restore the kernel stack. right ?
Do we ever have to worry about the kernel stack if we simply say that
tasks have to be *in* userspace when we checkpoint them.
at a syscall boundary for example. that would make our life easier definitely.


The ideal situation is never worry about kernel stack: either we catch
the task in user space or at a syscall boundary. This is taken care of
by freezing the tasks prior to checkpoint.

The one exception (and it is a tedious one !) are states in which the
task is already frozen by definition: any ptrace blocking point where
the tracee waits for the tracer to grant permission to proceed with
its execution. Another example is in vfork(), waiting for completion.

I would say these are perfect places for "may be non-checkpointable" :)

In both cases, there will be a kernel stack and we cannot avoid it.
The bad news is that it may be a bit tedious to restart these cases.
The good news, however, is that they are very well defined locations
with well defined semantics. So upon restart all that is needed is
to emulate the expected behavior had we not been checkpointed. This,
luckily, does not require rebuilding the kernel stack, but instead
some smart glue code for a finite set of special cases.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/