Re: [RFC v6][PATCH 0/9] Kernel based checkpoint/restart

From: Oren Laadan
Date: Mon Oct 13 2008 - 12:15:22 EST




Cedric Le Goater wrote:
> Ingo Molnar wrote:
>> * Dave Hansen <dave@xxxxxxxxxxxxxxxxxx> wrote:
>>
>>> On Thu, 2008-10-09 at 15:44 +0200, Ingo Molnar wrote:
>>>> there might be races as well, especially with proxy state - and
>>>> current->flags updates are not serialized.
>>>>
>>>> So maybe it should be a completely separate flag after all? Stick it
>>>> into the end of task_struct perhaps.
>>> What do you mean by proxy state? nsproxy?
>> it's a concept: one task installing some state into another task (which
>> state must be restored after a checkpoint event), while that other task
>> is running. Such as a pi-futex state for example.
>>
>> So a task can acquire state not just by its own doing, but via some
>> other task too.
>
> thinking aloud,
>
> hmm, that's rather complex, because we have to take into account the
> kernel stack, no ? This is what Andrey was trying to solve in his patchset
> back in September :
>
> http://lkml.org/lkml/2008/9/3/96
>
> the restart phase simulates a clone and switch_to to (not) restore the kernel
> stack. right ?
>
> the self checkpoint and self restore syscalls, like Oren is proposing, are
> simpler but they require the process cooperation to be triggered. we could
> image doing that in a special signal handler which would allow us to jump
> in the right task context.

This description is not accurate:

For checkpoint, both implementations use an "external" task to read the state
from other tasks. (In my implementation that "other" task can be self).

For restart, both implementation expect the restarting process to restore its
own state. They differ in that Andrew's patchset also creates that process
while mine (at the moment) relies on the existing (self) task.

In other words, none of them will require any cooperation on part of the
checkpointed tasks, and both will require cooperation on part of the restarting
tasks (the latter is easy since we create and fully control these tasks).

>
> I don't have any preference but looking at the code of the different patchsets
> there are some tricky areas and I'm wondering which path is easier, safer,
> and portable.

I am thinking which path is preferred: create the processes in kernel space
(like Andrew's patch does) or in user space (like Zap does). In the mini-summit
we agreed in favor of kernel space, but I can still see arguments why user space
may be better. (note: I refer strictly to the creation of the processes during
restart, not how their state is restored).

any thoughts ?

Oren.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/