Re: [PATCH] prctl: remove one-shot limitation for changing exe link

From: Cyrill Gorcunov
Date: Sun Jul 31 2016 - 18:45:19 EST


On Sat, Jul 30, 2016 at 12:31:40PM -0500, Eric W. Biederman wrote:
> Cyrill Gorcunov <gorcunov@xxxxxxxxx> writes:
>
> > On Mon, Jul 25, 2016 at 02:56:43PM -0500, Eric W. Biederman wrote:
> > ...
> >> >>
> >> >> Also there is a big fat bug in prctl_set_mm_exe_file. It doesn't
> >> >> validate that the new file is a actually mmaped executable. We would
> >> >> definitely need that to be fixed before even considering removing the
> >> >> limit.
> >> >
> >> > Could you please elaborate? We check for inode being executable,
> >> > what else needed?
> >>
> >> That the inode is mmaped into the process with executable mappings.

Eric, thanks for clarification. Let me talk from CRIU perspective (because
the interface came from its need) -- the former executable may no longer
exist, completely: for such cases in CRIU we simply create that named
"ghost" files wich are just literally removed upon open. So we simply
can't mmap the former executable into memory.

Moreover I would really _like_ to not do this check -- the former
intarface has been done exactly to behave as it does now: don't
read original file into memory (in criu, when we setup this exelink,
the original memory of a process already restored so additional
mmap for every executable is purely waste of time).

> >>
> >> Effectively what we check the old mapping for and refuse to remove the old
> >> mm_exe_file if it exists.
> >>
> >> I think a reasonable argument can be made that if the file is
> >> executable, and it is mmaped with executable pages that exe_file is not
> >> a complete lie.
> >
> > I might be missing something obvious, so sorry for the question --
> > when criu setups old exe link the inode we obtain from file open
> > is not mapped into memory, the old exe not read by anyone because
> > it's not even executed anyhow. So I don't really understand which
> > mapping we should check here. Mind to point me?
>
> That sounds like an out and out bug that should not be preserved.

No, it's done on intent, as I explained above -- we would like to
escape double reading (which not always possible in case of deleted
files).

> Of course we should mmap the executable and set it up so that it can be
> executed (at least as much as the executable was previously mapped).
> Anything else is a buggy restart, and lying to userspace.

Same way once someone ptraces the process it makes exelink to lie
into userspace. exelink is valid only small time moment: when
kernel reads elf and maps it. After that, once you jump back into
userspace that's end of game, there might be anything else running
instead of former executable as we already know.

>
> >> Which is the important part. At the end of the day how much can
> >> userspace trust /proc/pid/exe? If we are too lax it is just a random
> >> file descriptor we can not trust at all. At which point there is
> >> exactly no point in preserving it in checkpoint/restart, because nothing
> >> will trust or look at it.
> >
> > You know, I think we should not trust exe link much, and in real we
> > never could: this link is rather a hint pointing which executable a
> > process has been using on execve call, once the process start working
> > one can't be sure if the code currently running is exactly from the
> > file pointed by exe link. It just a hint suitable for debuggin and
> > obtain clean view of which processes are running on noncompromised
> > system. Monitoring exe link change won't help much if there are
> > malicious software running on the system.
>
> But it is not just a hint. It is a record of which executable we called
> execve on. Knowing which file was executed doesn't guarantee what is
> running now but it provides a very strong hint.

Exactly, its a record of what been valid when kernel did execve
and run new content. Once we're in userspace back this data may
be informative but not representative.

> At then end of a restart the state of a process should be (by
> definition) exactly the state the process was before a checkpoint
> and thus a state the original executable could have gotten into.
>
> I admit it is possible for an application to unmap itself. I honestly
> have not met that application (except perhaps criu).

It's not common practice on modern machines indeed. But you can do that,
you can even continue running when former executable no longer present
on the disk.

> >> If the only user is checkpoint/restart perhaps it should be only ptrace
> >> that can set this and not the process itself with a prctl. I don't
> >> know. All I know is that we should work on making it a very trustable
> >> value even though in some specific instances we can set it.
> >
> > Since as I said I suppose nobody except us using this feature, we can
> > setup some sysctl trigger for it (I personally think this is an
> > overkill, but OTOH if people rely on the exe link and not going
> > to use criu at all, this trigger will help).
>
> Some clarity of thought came to me, and I apologise for not replying
> sooner with it sooner.
>
> My problem with the original patch submission is that it was
> justifying changing prctl_set_mm_exe_file based on what
> prctl_set_mm_exe_file does today. As prctl_set_mm_exe_file was added
> for the checkpoint/restart case that is justifying changing code based
> on a buggy implementation.

I dont think so: it's not buggy but the minimum we need to be able
to restore deleted files. I'll read the rest of the email tomorrow,
thank you for comments!