Re: RFC: on adding new CLONE_* flags [WAS Re: [PATCH 0/4] clone: add CLONE_PIDFD]

From: Linus Torvalds
Date: Tue Apr 30 2019 - 12:19:40 EST


On Tue, Apr 30, 2019 at 1:21 AM Florian Weimer <fweimer@xxxxxxxxxx> wrote:
>
> > (In fact, if I recall correctly, the _reason_ we have an explicit
> > 'vfork()' entry point rather than using clone() with magic parameters
> > was that the lack of arguments meant that you didn't have to
> > save/restore any registers in user space, which made the whole stack
> > issue simpler. But it's been two decades, so my memory is bitrotting).
>
> That's an interesting point. Using a callback-style interface avoids
> that because you never need to restore the registers in the new
> subprocess. It's still appropriate to use an assembler implementation,
> I think, because it will be more obviously correct.

I agree that a callback interface would have been a whole lot more
obvious and less prone to subtle problems.

But if you want vfork() because the programs you want to build use it,
that's the interface you need..

Of course, if you *don't* need the exact vfork() semantics, clone
itself actually very much supports a callback model with s separate
stack. You can basically do this:

- allocate new stack for the child
- in trivial asm wrapper, do:
- push the callback address on the child stack
- clone(CLONE_VFORK|CLONE_VM|CLONE_SIGCHLD, chld_stack, NULL, NULL,NULL)
- "ret"
- free new stack

where the "ret" in the child will just go to the callback, while the
parent (eventually) just returns from the trivial wrapper and frees
the new stack (which by definition is no longer used, since the child
has exited or execve'd.

So you can most definitely create a "vfork_with_child_callback()" with
clone, and it would arguably be a much superior interface to vfork()
anyway (maybe you'd like to pass in some arguments to the callback too
- add more stack setup for the child as needed), but it wouldn't be
the right solution for programs that just want to use the standard BSD
vfork() model.

> vfork is also more benign from a memory accounting perspective. In some
> environments, it's not possible to call fork from a large process
> because the accounting assumes (conservatively) that the new process
> will dirty a lot of its private memory.

Indeed.

Linus