Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch

From: Grant Likely
Date: Mon Nov 08 2010 - 11:56:09 EST


On Tue, Nov 2, 2010 at 3:30 PM, Oren Laadan <orenl@xxxxxxxxxxxxxxx> wrote:
> Hi,
>
> Following the discussion yesterday, here is a linux-cr diff that
> that is limited to changes to existing code.
>
> The diff doesn't include the eclone() patches. I also tried to strip
> off the new c/r code (either code in new files, or new code within
> #ifdef CONFIG_CHECKPOINT in existing files).
>
> I left a few such snippets in, e.g. c/r syscalls templates and
> declaration of c/r specific methods in, e.g. file_operations.
>
> The remaining changes in this patch include new freezer state
> ("CHECKPOINTING"), mostly refactoring of exsiting code, and a bit
> of new helpers.
>
> Disclaimer: don't try to compile (or apply) - this is only intended
> to give a ballpark of how the c/r patches change existing code.
[...]
>  159 files changed, 2031 insertions(+), 587 deletions(-)

FWIW...

This patch has far reaching changes which quite frankly scare me;
primarily because c/r changes many long-held assumptions about how
Linux processes work. It needs to track a large amount of state with
lots of corner cases, and the Linux process model is already quite
complex. I know this is a fluffy hand-waving critique, but without
being convinced of a strong general-purpose use-case, it is hard to
get excited about a solution that touches large amounts of common
code.

c/r of desktop processes doesn't seem interesting other that as a test
case, but I can possibly be convinced about HPC, embedded, industrial,
or telecom use-cases, but for custom/specific-purpose applications the
question must be asked if a fully user space or joint user/kernel
method would better solve the problem.

You mentioned in a reply that this overview diff includes both
cleanups and required changes. I suggest posting the cleanup patches
as soon as possible so that this diff becomes simpler.

Also:

> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 9458685..335a4b3 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -93,6 +93,10 @@ config STACKTRACE_SUPPORT
> config HAVE_LATENCYTOP_SUPPORT
> def_bool y
>
> +config CHECKPOINT_SUPPORT
> + bool
> + default y
> +

Definitely should not default to 'y', and needs to be user-selectable.

g.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/