Re: [RFC v6][PATCH 0/9] Kernel based checkpoint/restart

From: Peter Chubb
Date: Thu Oct 16 2008 - 18:53:46 EST


>>>>> "Oren" == Oren Laadan <orenl@xxxxxxxxxxxxxxx> writes:

Oren> Daniel Lezcano wrote:

>>>
>>> The one exception (and it is a tedious one !) are states in which
>>> the task is already frozen by definition: any ptrace blocking
>>> point where the tracee waits for the tracer to grant permission to
>>> proceed with its execution. Another example is in vfork(), waiting
>>> for completion.
>> I would say these are perfect places for "may be
>> non-checkpointable" :)

Oren> For now, yes. But we definitely want this capability in the long
Oren> run; otherwise we won't be able to checkpoint a kernel compile
Oren> ('make' uses vfork), or anything with 'gdb' running inside, or
Oren> 'strace', and other goodies.

The strace/gdb example is *really* hard; but for vfork, you just wait
until it's over. The interval between vfork and exec/exit should be
short enough not to affect the overall time for a checkpoint (and
checkpoint can be fairly slow anyway --- on the HPC machines we used
to do it on, writing half a terabyte of checkpoint image to disc could take
many minutes. In hindsight, we should have multithreaded it).
Waiting for a vforked process to exec is less than a millisecond.
--
Dr Peter Chubb http://www.gelato.unsw.edu.au peterc AT gelato.unsw.edu.au
http://www.ertos.nicta.com.au ERTOS within National ICT Australia
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/