Re: [PATCH 10/10] Add support for multiple processes

From: Oren Laadan
Date: Mon Oct 27 2008 - 11:58:39 EST




Andrey Mirkin wrote:
> The whole tree of processes can be checkpointed and restarted now.
> Shared objects are not supported yet.
>
> Signed-off-by: Andrey Mirkin <major@xxxxxxxxxx>
> ---
> checkpoint/cpt_image.h | 2 +
> checkpoint/cpt_process.c | 24 +++++++++++++
> checkpoint/rst_process.c | 85 +++++++++++++++++++++++++++-------------------
> 3 files changed, 76 insertions(+), 35 deletions(-)
>
> diff --git a/checkpoint/cpt_image.h b/checkpoint/cpt_image.h
> index e1fb483..f370df2 100644
> --- a/checkpoint/cpt_image.h
> +++ b/checkpoint/cpt_image.h
> @@ -128,6 +128,8 @@ struct cpt_task_image {
> __u64 cpt_nivcsw;
> __u64 cpt_min_flt;
> __u64 cpt_maj_flt;
> + __u32 cpt_children_num;
> + __u32 cpt_pad;
> } __attribute__ ((aligned (8)));
>
> struct cpt_mm_image {
> diff --git a/checkpoint/cpt_process.c b/checkpoint/cpt_process.c
> index 1f7a54b..d73ec3c 100644
> --- a/checkpoint/cpt_process.c
> +++ b/checkpoint/cpt_process.c
> @@ -40,6 +40,19 @@ static unsigned int encode_task_flags(unsigned int task_flags)
>
> }
>
> +static int cpt_count_children(struct task_struct *tsk, struct cpt_context *ctx)
> +{
> + int num = 0;
> + struct task_struct *child;
> +
> + list_for_each_entry(child, &tsk->children, sibling) {
> + if (child->parent != tsk)
> + continue;
> + num++;
> + }
> + return num;
> +}
> +

I noticed that don't take the appropriate locks when browsing through
tasks lists (siblings, children, global list). Although I realize that
the container should be frozen at this time, I keep wondering if this
is indeed always safe.

For instance, are you protected against an OOM killer that might just
occur uninvited and kill one of those tasks ?

Can the administrator force an un-freeze of the container ? Or perhaps
some error condition if the kernel cause that ?

> int cpt_dump_task_struct(struct task_struct *tsk, struct cpt_context *ctx)
> {
> struct cpt_task_image *t;
> @@ -102,6 +115,7 @@ int cpt_dump_task_struct(struct task_struct *tsk, struct cpt_context *ctx)
> t->cpt_egid = tsk->egid;
> t->cpt_sgid = tsk->sgid;
> t->cpt_fsgid = tsk->fsgid;
> + t->cpt_children_num = cpt_count_children(tsk, ctx);
>
> err = ctx->write(t, sizeof(*t), ctx);
>
> @@ -231,6 +245,16 @@ int cpt_dump_task(struct task_struct *tsk, struct cpt_context *ctx)
> err = cpt_dump_fpustate(tsk, ctx);
> if (!err)
> err = cpt_dump_registers(tsk, ctx);
> + if (!err) {
> + struct task_struct *child;
> + list_for_each_entry(child, &tsk->children, sibling) {
> + if (child->parent != tsk)
> + continue;
> + err = cpt_dump_task(child, ctx);
> + if (err)
> + break;

Here too.

[...]

Oren.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/