Re: [patch 5/5] elf: Add support for loading ET_CKPT files

From: Tejun Heo
Date: Wed Oct 19 2011 - 14:22:48 EST


Hello, Pavel.

On Wed, Oct 19, 2011 at 01:03:17PM +0400, Pavel Emelyanov wrote:
> > I don't think this is a good idea. We already have most of interface
> > necessary for restoring task state and there's no need to put it into
> > the kernel as one piece. If there's something which can't be done
> > from userland using existing interfaces, let's please discuss what
> > they are and whether they can be resolved differently first.
>
> The rest API we will require will look too special-purpose (like Cyrill
> mentioned the ability to set the mm's fields such as start_code, etc).

I find that quite difficult to agree with. We're talking about some
minor additions to prctl against updating exec path to do something it
was never designed to do + new binary format.

> Besides, putting a parasite code to restore the memory state will drive
> us into certain limitations. E.g. - what address range in the target task
> address space should the parasite occupy? What if we fail in finding one,
> how should we act next?

How is that different from snapshotting time? Unless the address is
completely packed, I can't see how that can be a problem. In the
worst case, you can use the parasite to map what's not overlapping and
then let the ptracer restore the overlapping areas after parasite is
done.

> > The exec path is very intricate as it is and it would be very easy to
> > introduce security or other bugs by altering its basic behavior.
>
> Can you elaborate a bit more on this? "Other" bugs can be introduced by
> any piece of code injected into the kernel, but what about security?

exec is a pretty important security boundary. A lot of resources (fd,
VMAs, credentails) have security-sensitive policies forced across exec
boundary and there are enough places which depends on the all
resetting property of exec (e.g. no other user of thread-group wide
resources).

> > exec presumes destruction of (most of) the current process and all its
> > other threads and replacing them w/ fresh states from an executable.
>
> This is *exactly* what is expected from the restoration process.

Umm.... so why does the patch skip zapping? And if you are gonna be
zapping, how are you gonna do multiple threads?

> > I see that you removed zapping to allow restoring multi-threaded
> > process, which seems quite scary to me. It might be okay, I don't
> > know, but IMHO it just isn't a very good idea to introduce such
> > variation to basic exec behavior for this rather narrow use case.
>
> Why? Can you describe your thought on this more, maybe I'm really missing
> smth important.

Because it breaks one of the very basic assumptions - there are no
other tasks in the thread group and thus resources usually shared
among threadgroup can be assumed to be exclusive.

> > In addition, leaving it alone doesn't really solve multi-threaded
> > restoring, does it?
>
> So your main concern is about multy-threaded tasks, right? I think we can
> prepare the exec handler capable to show how this can look like.

I'm strongly against that direction regardless of implementation
details.

> > The current code doesn't seem to be doing anything but if you're gonna add
> > it later, how are you gonna synchronize other threads?
>
> Synchronize what? If we're providing to a userspace a way to put things right
> it's up to userspace to use it correctly. Like we saw this wit pid namespaces -
> there's are plenty of ways how to kill the app with CLONE_NEWPID clone flag, but
> the intention of one is not the be 100% fool-proof, but to provide an ability
> to do what userspace need.

To shared kernel data structures.

> One of the abilities to restore multy-threaded task can be - clone threads from
> the userspace, then wait for them to prepare themselves the way they need it,
> then the threads go to sleep and the leader one calls execve. Thus the userspace
> state consistency is OK.

If you're gonna let userspace do it, why bother with in-kernel thing
at all?

> > * It's still calling exec_mmap(), so the exec'ing thread will be on
> > the new mm while other threads would still be on the old one.
>
> Why do you think it's a problem?

Ummm.... they aren't on the same address space? How can that be okay?
It's not only wrong from CR POV. The kernel disallows
CLONE_THREAD/SIGHAND w/o CLONE_VM and depend on that assumption,
you're breaking that.

> > * There are operations which assume that the calling task is
> > de-threaded and thus has exclusive access to data structures which
> > can be shared among the thread group (e.g. signal struct). How are
> > accesses to them synchronized?
>
> If we're talking about keeping the userspace state solid, then stopping the
> other threads solves this.

Again, there are kernel exclusivity assumptions.

> That's a pity. As I stated above, we'll try to demonstrate the exec handler
> capable to restore the multy-threaded APP and send the 2nd RFC. Can you answer
> my questions above so we keep your concerns in mind while doing this, please.

IMHO, this is a fundamentally broken approach which isn't even
necessary. So, FWIW, NACK.

Thank you.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/